LSHTM_analysis/scripts/ml/log_gid_sl.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_sl.py:549: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 531

PASS: my_features_df and aa_df successfully combined
nrows: 531
ncols: 286
count of NULL values before imputation

or_mychisq          263
log10_or_mychisq    263
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

Total no. of features for aaindex: 123

No. of numerical features: 167
No. of categorical features: 7

PASS: x_features has no target variable

No. of columns for x_features: 174

-------------------------------------------------------------
Successfully split data according to scaling law: 1/np.sqrt(x_ncols)
Train data size: (109, 174)
Test data size: 0.07580980435789034      (10, 174)
y_train numbers: Counter({0: 70, 1: 39})
y_train ratio: 1.794871794871795

y_test_numbers: Counter({0: 6, 1: 4})
y_test ratio: 1.5
-------------------------------------------------------------

Simple Random OverSampling
 Counter({0: 70, 1: 70})
(140, 174)

Simple Random UnderSampling
 Counter({0: 39, 1: 39})
(78, 174)

Simple Combined Over and UnderSampling
 Counter({0: 70, 1: 70})
(140, 174)

SMOTE_NC OverSampling
 Counter({0: 70, 1: 70})
(140, 174)

#####################################################################

Running ML analysis: scaling law split
Gene name: gid
Drug name: streptomycin

Output directory: /home/tanu/git/Data/streptomycin/output/ml/tts_sl/
Sanity checks:
ML source data size: (119, 174)
Total input features: (109, 174)
Target feature numbers: Counter({0: 70, 1: 39})
Target features ratio: 1.794871794871795

#####################################################################


================================================================

Strucutral features (n): 35
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

AAindex features (n): 123
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02453518 0.030195   0.02754903 0.02711487 0.0307312  0.02705741
 0.02809882 0.02859855 0.04155993 0.03637624]

mean value: 0.030181622505187987

key: score_time
value: [0.01295424 0.01227617 0.01204991 0.01202321 0.01229239 0.01206994
 0.01218534 0.01203108 0.01242352 0.01243353]

mean value: 0.012273931503295898

key: test_mcc
value: [0.         0.21428571 0.21428571 0.21428571 1.         0.38575837
 0.44854261 0.62360956 0.13363062 0.52380952]

mean value: 0.37582078405629626

key: train_mcc
value: [0.86978258 0.86680492 0.88878629 0.86680492 0.8449259  0.86680492
 0.89113279 0.8449259  0.84852814 0.86933707]

mean value: 0.8657833441636803

key: test_accuracy
value: [0.63636364 0.63636364 0.63636364 0.63636364 1.         0.72727273
 0.72727273 0.81818182 0.63636364 0.8       ]

mean value: 0.7254545454545455

key: train_accuracy
value: [0.93877551 0.93877551 0.94897959 0.93877551 0.92857143 0.93877551
 0.94897959 0.92857143 0.92857143 0.93939394]

mean value: 0.9378169449598022
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(

key: test_fscore
value: [0.         0.5        0.5        0.5        1.         0.57142857
 0.66666667 0.66666667 0.33333333 0.66666667]

mean value: 0.5404761904761904

key: train_fscore
value: [0.90625    0.90909091 0.92537313 0.90909091 0.89230769 0.90909091
 0.92307692 0.89230769 0.88888889 0.91176471]

mean value: 0.9067241764064634

key: test_precision
value: [0.         0.5        0.5        0.5        1.         0.66666667
 0.6        1.         0.5        0.66666667]

mean value: 0.5933333333333333

key: train_precision
value: [1.         0.96774194 0.96875    0.96774194 0.96666667 0.96774194
 1.         0.96666667 1.         0.96875   ]

mean value: 0.9774059139784946

key: test_recall
value: [0.         0.5        0.5        0.5        1.         0.5
 0.75       0.5        0.25       0.66666667]

mean value: 0.5166666666666666

key: train_recall
value: [0.82857143 0.85714286 0.88571429 0.85714286 0.82857143 0.85714286
 0.85714286 0.82857143 0.8        0.86111111]

mean value: 0.8461111111111111

key: test_roc_auc
value: [0.5        0.60714286 0.60714286 0.60714286 1.         0.67857143
 0.73214286 0.75       0.55357143 0.76190476]

mean value: 0.6797619047619048

key: train_roc_auc
value: [0.91428571 0.92063492 0.93492063 0.92063492 0.90634921 0.92063492
 0.92857143 0.90634921 0.9        0.92261905]

mean value: 0.9175000000000001

key: test_jcc
value: [0.         0.33333333 0.33333333 0.33333333 1.         0.4
 0.5        0.5        0.2        0.5       ]

mean value: 0.41

key: train_jcc
value: [0.82857143 0.83333333 0.86111111 0.83333333 0.80555556 0.83333333
 0.85714286 0.80555556 0.8        0.83783784]

mean value: 0.8295774345774346

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.6318655  0.6620369  0.63318753 0.63215232 0.62510443 0.76653695
 0.64627314 0.66265655 1.02622437 0.67012167]

mean value: 0.6956159353256226

key: score_time
value: [0.01334572 0.0123713  0.0124464  0.01250243 0.0123589  0.01255822
 0.01497769 0.01635933 0.01404405 0.01240587]

mean value: 0.013336992263793946

key: test_mcc
value: [0.41833001 0.38575837 0.21428571 0.21428571 0.82807867 0.38575837
 0.69006556 0.81009259 0.38575837 0.52380952]

mean value: 0.48562229082178415

key: train_mcc
value: [0.95595301 0.93314074 0.91089125 0.75799004 0.91089125 0.88878629
 0.95595301 1.         1.         0.91253747]

mean value: 0.9226143059319446

key: test_accuracy
value: [0.72727273 0.72727273 0.63636364 0.63636364 0.90909091 0.72727273
 0.81818182 0.90909091 0.72727273 0.8       ]

mean value: 0.7618181818181818

key: train_accuracy
value: [0.97959184 0.96938776 0.95918367 0.8877551  0.95918367 0.94897959
 0.97959184 1.         1.         0.95959596]

mean value: 0.9643269428983714

key: test_fscore
value: [0.4        0.57142857 0.5        0.5        0.88888889 0.57142857
 0.8        0.85714286 0.57142857 0.66666667]

mean value: 0.6326984126984128

key: train_fscore
value: [0.97058824 0.95652174 0.94117647 0.81967213 0.94117647 0.92537313
 0.97058824 1.         1.         0.94285714]

mean value: 0.9467953559228183

key: test_precision
value: [1.         0.66666667 0.5        0.5        0.8        0.66666667
 0.66666667 1.         0.66666667 0.66666667]

mean value: 0.7133333333333334

key: train_precision
value: [1.         0.97058824 0.96969697 0.96153846 0.96969697 0.96875
 1.         1.         1.         0.97058824]

mean value: 0.9810858871520636

key: test_recall
value: [0.25       0.5        0.5        0.5        1.         0.5
 1.         0.75       0.5        0.66666667]

mean value: 0.6166666666666667

key: train_recall
value: [0.94285714 0.94285714 0.91428571 0.71428571 0.91428571 0.88571429
 0.94285714 1.         1.         0.91666667]

mean value: 0.9173809523809524

key: test_roc_auc
value: [0.625      0.67857143 0.60714286 0.60714286 0.92857143 0.67857143
 0.85714286 0.875      0.67857143 0.76190476]

mean value: 0.7297619047619047

key: train_roc_auc
value: [0.97142857 0.96349206 0.94920635 0.84920635 0.94920635 0.93492063
 0.97142857 1.         1.         0.95039683]

mean value: 0.9539285714285715

key: test_jcc
value: [0.25       0.4        0.33333333 0.33333333 0.8        0.4
 0.66666667 0.75       0.4        0.5       ]

mean value: 0.48333333333333334

key: train_jcc
value: [0.94285714 0.91666667 0.88888889 0.69444444 0.88888889 0.86111111
 0.94285714 1.         1.         0.89189189]

mean value: 0.9027606177606178

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01464987 0.00926733 0.00862837 0.00862694 0.0085156  0.00847125
 0.00865388 0.0086422  0.00871873 0.00910592]

mean value: 0.009328007698059082

key: score_time
value: [0.0125277  0.00906873 0.00880456 0.00855637 0.00875044 0.0086751
 0.00854564 0.00900102 0.00869846 0.00857162]

mean value: 0.009119963645935059

key: test_mcc
value: [-0.06900656 -0.13363062  0.03857584  0.06900656  0.3105295  -0.21428571
  0.17857143  0.21428571  0.21428571  0.32732684]

mean value: 0.09356586964495016

key: train_mcc
value: [0.30219324 0.40233433 0.47469288 0.66470299 0.35901099 0.38729833
 0.34426519 0.47336463 0.32995002 0.42431986]

mean value: 0.41621324666625875

key: test_accuracy
value: [0.45454545 0.36363636 0.45454545 0.54545455 0.63636364 0.36363636
 0.54545455 0.63636364 0.63636364 0.5       ]

mean value: 0.5136363636363637

key: train_accuracy
value: [0.63265306 0.68367347 0.7244898  0.84693878 0.64285714 0.66326531
 0.64285714 0.70408163 0.63265306 0.6969697 ]

mean value: 0.68704390847248

key: test_fscore
value: [0.4        0.46153846 0.5        0.44444444 0.6        0.36363636
 0.54545455 0.5        0.5        0.54545455]

mean value: 0.48605283605283606

key: train_fscore
value: [0.59090909 0.64367816 0.68235294 0.7826087  0.62365591 0.63736264
 0.61538462 0.68131868 0.60869565 0.65909091]

mean value: 0.6525057297966527

key: test_precision
value: [0.33333333 0.33333333 0.375      0.4        0.5        0.28571429
 0.42857143 0.5        0.5        0.375     ]

mean value: 0.40309523809523806

key: train_precision
value: [0.49056604 0.53846154 0.58       0.79411765 0.5        0.51785714
 0.5        0.55357143 0.49122807 0.55769231]

mean value: 0.5523494172552529

key: test_recall
value: [0.5  0.75 0.75 0.5  0.75 0.5  0.75 0.5  0.5  1.  ]

mean value: 0.65

key: train_recall
value: [0.74285714 0.8        0.82857143 0.77142857 0.82857143 0.82857143
 0.8        0.88571429 0.8        0.80555556]

mean value: 0.8091269841269841

key: test_roc_auc
value: [0.46428571 0.44642857 0.51785714 0.53571429 0.66071429 0.39285714
 0.58928571 0.60714286 0.60714286 0.64285714]

mean value: 0.5464285714285714

key: train_roc_auc
value: [0.65714286 0.70952381 0.74761905 0.83015873 0.68412698 0.7
 0.67777778 0.74444444 0.66984127 0.7202381 ]

mean value: 0.7140873015873016

key: test_jcc
value: [0.25       0.3        0.33333333 0.28571429 0.42857143 0.22222222
 0.375      0.33333333 0.33333333 0.375     ]

mean value: 0.32365079365079363

key: train_jcc
value: [0.41935484 0.47457627 0.51785714 0.64285714 0.453125   0.46774194
 0.44444444 0.51666667 0.4375     0.49152542]

mean value: 0.48656488659341995

MCC on Blind test: 0.67

Accuracy on Blind test: 0.8

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00878239 0.00921321 0.00866842 0.00868893 0.00864553 0.00862384
 0.00863862 0.0087378  0.00851679 0.00869918]

mean value: 0.008721470832824707

key: score_time
value: [0.00925446 0.00853109 0.00853133 0.00859833 0.00865149 0.00865412
 0.00859571 0.00850534 0.0085783  0.00853562]

mean value: 0.008643579483032227

key: test_mcc
value: [ 0.41833001 -0.03857584 -0.35634832  0.13363062  0.81009259  0.13363062
  0.44854261 -0.35634832 -0.23904572  0.04761905]

mean value: 0.10015272992148226

key: train_mcc
value: [0.58972429 0.61549072 0.62652663 0.56504712 0.54087139 0.5890183
 0.64246907 0.61976141 0.58972429 0.58367897]

mean value: 0.5962312182242981

key: test_accuracy
value: [0.72727273 0.54545455 0.45454545 0.63636364 0.90909091 0.63636364
 0.72727273 0.45454545 0.54545455 0.6       ]

mean value: 0.6236363636363637

key: train_accuracy
value: [0.81632653 0.82653061 0.82653061 0.80612245 0.79591837 0.81632653
 0.83673469 0.82653061 0.81632653 0.80808081]

mean value: 0.8175427746856319

key: test_fscore
value: [0.4        0.28571429 0.         0.33333333 0.85714286 0.33333333
 0.66666667 0.         0.         0.33333333]

mean value: 0.32095238095238093

key: train_fscore
value: [0.7        0.71186441 0.69090909 0.68852459 0.66666667 0.70967742
 0.72413793 0.70175439 0.7        0.66666667]

mean value: 0.6960201157540253

key: test_precision
value: [1.         0.33333333 0.         0.5        1.         0.5
 0.6        0.         0.         0.33333333]

mean value: 0.42666666666666664

key: train_precision
value: [0.84       0.875      0.95       0.80769231 0.8        0.81481481
 0.91304348 0.90909091 0.84       0.9047619 ]

mean value: 0.8654403414620806

key: test_recall
value: [0.25       0.25       0.         0.25       0.75       0.25
 0.75       0.         0.         0.33333333]

mean value: 0.2833333333333333

key: train_recall
value: [0.6        0.6        0.54285714 0.6        0.57142857 0.62857143
 0.6        0.57142857 0.6        0.52777778]

mean value: 0.5842063492063492

key: test_roc_auc
value: [0.625      0.48214286 0.35714286 0.55357143 0.875      0.55357143
 0.73214286 0.35714286 0.42857143 0.52380952]

mean value: 0.5488095238095239

key: train_roc_auc
value: [0.76825397 0.77619048 0.76349206 0.76031746 0.74603175 0.77460317
 0.78412698 0.76984127 0.76825397 0.74801587]

mean value: 0.7659126984126984

key: test_jcc
value: [0.25       0.16666667 0.         0.2        0.75       0.2
 0.5        0.         0.         0.2       ]

mean value: 0.22666666666666668

key: train_jcc
value: [0.53846154 0.55263158 0.52777778 0.525      0.5        0.55
 0.56756757 0.54054054 0.53846154 0.5       ]

mean value: 0.5340440541756332

MCC on Blind test: 0.36

Accuracy on Blind test: 0.7

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00848818 0.01176858 0.00879884 0.00929117 0.0093472  0.00941753
 0.0095644  0.00954175 0.00954604 0.0095284 ]

mean value: 0.009529209136962891

key: score_time
value: [0.05046916 0.03067589 0.01002026 0.01012683 0.0101347  0.0102005
 0.01021075 0.01025558 0.01041079 0.01048422]

mean value: 0.016298866271972655

key: test_mcc
value: [-0.23904572  0.13363062  0.38575837 -0.03857584  0.81009259  0.13363062
 -0.03857584  0.38575837 -0.03857584  0.21821789]

mean value: 0.1712315234921411

key: train_mcc
value: [0.49201849 0.44316559 0.36232865 0.44248774 0.36787951 0.49201849
 0.41560471 0.46664388 0.49368586 0.45456865]

mean value: 0.4430401554934932

key: test_accuracy
value: [0.54545455 0.63636364 0.72727273 0.54545455 0.90909091 0.63636364
 0.54545455 0.72727273 0.54545455 0.7       ]

mean value: 0.6518181818181819

key: train_accuracy
value: [0.7755102  0.75510204 0.7244898  0.75510204 0.7244898  0.7755102
 0.74489796 0.76530612 0.7755102  0.75757576]

mean value: 0.7553494124922696

key: test_fscore
value: [0.         0.33333333 0.57142857 0.28571429 0.85714286 0.33333333
 0.28571429 0.57142857 0.28571429 0.4       ]

mean value: 0.3923809523809524

key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[0.63333333 0.6        0.50909091 0.55555556 0.54237288 0.63333333
 0.56140351 0.59649123 0.64516129 0.5862069 ]

mean value: 0.5862948936385473

key: test_precision
value: [0.         0.5        0.66666667 0.33333333 1.         0.5
 0.33333333 0.66666667 0.33333333 0.5       ]

mean value: 0.48333333333333334

key: train_precision
value: [0.76       0.72       0.7        0.78947368 0.66666667 0.76
 0.72727273 0.77272727 0.74074074 0.77272727]

mean value: 0.7409608364345206

key: test_recall
value: [0.         0.25       0.5        0.25       0.75       0.25
 0.25       0.5        0.25       0.33333333]

mean value: 0.3333333333333333

key: train_recall
value: [0.54285714 0.51428571 0.4        0.42857143 0.45714286 0.54285714
 0.45714286 0.48571429 0.57142857 0.47222222]

mean value: 0.4872222222222222

key: test_roc_auc
value: [0.42857143 0.55357143 0.67857143 0.48214286 0.875      0.55357143
 0.48214286 0.67857143 0.48214286 0.5952381 ]

mean value: 0.580952380952381

key: train_roc_auc
value: [0.72380952 0.7015873  0.65238095 0.68253968 0.66507937 0.72380952
 0.68095238 0.7031746  0.73015873 0.69642857]

mean value: 0.6959920634920634

key: test_jcc
value: [0.         0.2        0.4        0.16666667 0.75       0.2
 0.16666667 0.4        0.16666667 0.25      ]

mean value: 0.27

key: train_jcc
value: [0.46341463 0.42857143 0.34146341 0.38461538 0.37209302 0.46341463
 0.3902439  0.425      0.47619048 0.41463415]

mean value: 0.415964104434042

MCC on Blind test: 0.17

Accuracy on Blind test: 0.6

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01114202 0.01079059 0.01085377 0.01060557 0.01079392 0.00984693
 0.01081705 0.01012373 0.01036024 0.01075292]

mean value: 0.010608673095703125

key: score_time
value: [0.00997567 0.0099051  0.01002312 0.00994086 0.010077   0.01012874
 0.00993395 0.00938058 0.00973964 0.00926208]

mean value: 0.009836673736572266

key: test_mcc
value: [0.         0.13363062 0.13363062 0.41833001 0.         0.41833001
 0.41833001 0.         0.41833001 0.21821789]

mean value: 0.21587991852165678

key: train_mcc
value: [0.57035183 0.6363961  0.6146363  0.59263776 0.59263776 0.54772256
 0.57035183 0.54772256 0.6146363  0.55901699]

mean value: 0.5846109973682458

key: test_accuracy
value: [0.63636364 0.63636364 0.63636364 0.72727273 0.63636364 0.72727273
 0.72727273 0.63636364 0.72727273 0.7       ]

mean value: 0.6790909090909091

key: train_accuracy
value: [0.79591837 0.82653061 0.81632653 0.80612245 0.80612245 0.78571429
 0.79591837 0.78571429 0.81632653 0.78787879]

mean value: 0.8022572665429808

key: test_fscore
value: [0.         0.33333333 0.33333333 0.4        0.         0.4
 0.4        0.         0.4        0.4       ]

mean value: 0.26666666666666666

key: train_fscore
value: [0.6        0.67924528 0.65384615 0.62745098 0.62745098 0.57142857
 0.6        0.57142857 0.65384615 0.58823529]

mean value: 0.6172931988470279

key: test_precision
value: [0.  0.5 0.5 1.  0.  1.  1.  0.  1.  0.5]

mean value: 0.55

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.         0.25       0.25       0.25       0.         0.25
 0.25       0.         0.25       0.33333333]

mean value: 0.18333333333333332

key: train_recall
value: [0.42857143 0.51428571 0.48571429 0.45714286 0.45714286 0.4
 0.42857143 0.4        0.48571429 0.41666667]

mean value: 0.4473809523809524

key: test_roc_auc
value: [0.5        0.55357143 0.55357143 0.625      0.5        0.625
 0.625      0.5        0.625      0.5952381 ]

mean value: 0.5702380952380952

key: train_roc_auc
value: [0.71428571 0.75714286 0.74285714 0.72857143 0.72857143 0.7
 0.71428571 0.7        0.74285714 0.70833333]

mean value: 0.7236904761904762

key: test_jcc
value: [0.   0.2  0.2  0.25 0.   0.25 0.25 0.   0.25 0.25]

mean value: 0.165

key: train_jcc
value: [0.42857143 0.51428571 0.48571429 0.45714286 0.45714286 0.4
 0.42857143 0.4        0.48571429 0.41666667]

mean value: 0.4473809523809524

MCC on Blind test: 0.61

Accuracy on Blind test: 0.8

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.52748394 0.58608747 0.57734966 0.51142216 0.66457558 0.65659547
 0.45914388 0.5262475  0.48919296 0.69721627]

mean value: 0.5695314884185791

key: score_time
value: [0.01240587 0.01868153 0.0124526  0.01238823 0.01570678 0.01239204
 0.01241803 0.01240945 0.01244521 0.01239586]

mean value: 0.013369560241699219

key: test_mcc
value: [0.60714286 0.13363062 0.21428571 0.21428571 0.60714286 0.44854261
 0.46291005 0.60714286 0.38575837 0.21821789]

mean value: 0.389905954955624

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.81818182 0.63636364 0.63636364 0.63636364 0.81818182 0.72727273
 0.63636364 0.81818182 0.72727273 0.7       ]

mean value: 0.7154545454545455

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.33333333 0.5        0.5        0.75       0.66666667
 0.66666667 0.75       0.57142857 0.4       ]

mean value: 0.5888095238095238

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.5        0.5        0.5        0.75       0.6
 0.5        0.75       0.66666667 0.5       ]

mean value: 0.6016666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.25       0.5        0.5        0.75       0.75
 1.         0.75       0.5        0.33333333]

mean value: 0.6083333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.80357143 0.55357143 0.60714286 0.60714286 0.80357143 0.73214286
 0.71428571 0.80357143 0.67857143 0.5952381 ]

mean value: 0.6898809523809524

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.2        0.33333333 0.33333333 0.6        0.5
 0.5        0.6        0.4        0.25      ]

mean value: 0.43166666666666664

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.0163362  0.01244402 0.01193404 0.01061082 0.01098156 0.0116725
 0.01134181 0.01184058 0.01182842 0.01150846]

mean value: 0.01204984188079834

key: score_time
value: [0.01191998 0.0099411  0.00993514 0.0090301  0.0094378  0.00917912
 0.00951123 0.00942707 0.00954127 0.01003218]

mean value: 0.009795498847961426

key: test_mcc
value: [0.38575837 0.82807867 0.82807867 1.         1.         0.60714286
 0.60714286 0.82807867 0.60714286 0.76376262]

mean value: 0.745518557579225

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.72727273 0.90909091 0.90909091 1.         1.         0.81818182
 0.81818182 0.90909091 0.81818182 0.9       ]

mean value: 0.8809090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.57142857 0.88888889 0.88888889 1.         1.         0.75
 0.75       0.88888889 0.75       0.8       ]

mean value: 0.8288095238095239

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.8        0.8        1.         1.         0.75
 0.75       0.8        0.75       1.        ]

mean value: 0.8316666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5        1.         1.         1.         1.         0.75
 0.75       1.         0.75       0.66666667]

mean value: 0.8416666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.67857143 0.92857143 0.92857143 1.         1.         0.80357143
 0.80357143 0.92857143 0.80357143 0.83333333]

mean value: 0.8708333333333333

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.4        0.8        0.8        1.         1.         0.6
 0.6        0.8        0.6        0.66666667]

mean value: 0.7266666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.67

Accuracy on Blind test: 0.8

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.09402609 0.09229231 0.09156442 0.08829975 0.08816385 0.08814311
 0.08624172 0.08659744 0.0860436  0.08706236]

mean value: 0.0888434648513794

key: score_time
value: [0.01869106 0.01854372 0.01791668 0.01841903 0.01806092 0.01795483
 0.01827717 0.01738906 0.01821613 0.01852036]

mean value: 0.018198895454406738

key: test_mcc
value: [0.62360956 0.06900656 0.38575837 0.44854261 0.38575837 0.21428571
 0.21428571 0.62360956 0.62360956 0.50917508]

mean value: 0.409764111849294

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.81818182 0.54545455 0.72727273 0.72727273 0.72727273 0.63636364
 0.63636364 0.81818182 0.81818182 0.8       ]

mean value: 0.7254545454545455

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.44444444 0.57142857 0.66666667 0.57142857 0.5
 0.5        0.66666667 0.66666667 0.5       ]

mean value: 0.5753968253968254

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.4        0.66666667 0.6        0.66666667 0.5
 0.5        1.         1.         1.        ]

mean value: 0.7333333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5        0.5        0.5        0.75       0.5        0.5
 0.5        0.5        0.5        0.33333333]

mean value: 0.5083333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.53571429 0.67857143 0.73214286 0.67857143 0.60714286
 0.60714286 0.75       0.75       0.66666667]

mean value: 0.6755952380952381

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.28571429 0.4        0.5        0.4        0.33333333
 0.33333333 0.5        0.5        0.33333333]

mean value: 0.4085714285714286

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00997877 0.00966382 0.00917029 0.009238   0.0097146  0.00969219
 0.00935483 0.00956774 0.00983381 0.0093317 ]

mean value: 0.009554576873779298

key: score_time
value: [0.00899959 0.008461   0.00906372 0.00885987 0.00935435 0.00913429
 0.00945234 0.0089314  0.0090282  0.00912428]

mean value: 0.009040904045104981

key: test_mcc
value: [ 0.62360956 -0.3105295   0.38575837  0.57142857  0.82807867  0.13363062
  0.21428571  1.         -0.03857584 -0.08908708]

mean value: 0.3318599097416819

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.81818182 0.36363636 0.72727273 0.72727273 0.90909091 0.63636364
 0.63636364 1.         0.54545455 0.5       ]

mean value: 0.6863636363636364

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.22222222 0.57142857 0.72727273 0.88888889 0.33333333
 0.5        1.         0.28571429 0.28571429]

mean value: 0.5481240981240981

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.2        0.66666667 0.57142857 0.8        0.5
 0.5        1.         0.33333333 0.25      ]

mean value: 0.5821428571428572

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5        0.25       0.5        1.         1.         0.25
 0.5        1.         0.25       0.33333333]

mean value: 0.5583333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.33928571 0.67857143 0.78571429 0.92857143 0.55357143
 0.60714286 1.         0.48214286 0.45238095]

mean value: 0.6577380952380952

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.125      0.4        0.57142857 0.8        0.2
 0.33333333 1.         0.16666667 0.16666667]

mean value: 0.4263095238095238

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.1096015  1.08083749 1.08970332 1.12753987 1.06541085 1.056283
 1.06098294 1.083395   1.07819152 1.06367922]

mean value: 1.0815624713897705

key: score_time
value: [0.09377718 0.09102583 0.0940125  0.09139752 0.08611059 0.08667684
 0.08917809 0.09408355 0.08660865 0.09382653]

mean value: 0.09066972732543946

key: test_mcc
value: [0.41833001 0.60714286 0.62360956 0.81009259 0.60714286 0.38575837
 0.44854261 1.         0.41833001 0.50917508]

mean value: 0.5828123958278172

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.72727273 0.81818182 0.81818182 0.90909091 0.81818182 0.72727273
 0.72727273 1.         0.72727273 0.8       ]

mean value: 0.8072727272727273

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.4        0.75       0.66666667 0.85714286 0.75       0.57142857
 0.66666667 1.         0.4        0.5       ]

mean value: 0.6561904761904762

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.75       1.         1.         0.75       0.66666667
 0.6        1.         1.         1.        ]

mean value: 0.8766666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.25       0.75       0.5        0.75       0.75       0.5
 0.75       1.         0.25       0.33333333]

mean value: 0.5833333333333334

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.625      0.80357143 0.75       0.875      0.80357143 0.67857143
 0.73214286 1.         0.625      0.66666667]

mean value: 0.7559523809523809

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.25       0.6        0.5        0.75       0.6        0.4
 0.5        1.         0.25       0.33333333]

mean value: 0.5183333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.7521224  0.92771101 0.87955499 0.98863387 0.87796545 0.96173191
 0.81847405 0.84688282 0.88875556 0.87859821]

mean value: 0.9820430278778076

key: score_time
value: [0.22265458 0.22081852 0.21812773 0.23905349 0.11643434 0.23435545
 0.21912909 0.1910212  0.23325014 0.12879086]

mean value: 0.20236353874206542

key: test_mcc
value: [0.         0.81009259 0.62360956 0.81009259 0.60714286 0.62360956
 0.41833001 0.81009259 0.41833001 0.50917508]

mean value: 0.5630474851721843

key: train_mcc
value: [0.91259839 0.86680492 0.93419873 0.89113279 0.89113279 0.91259839
 0.86978258 0.91259839 0.93419873 0.89319321]

mean value: 0.901823893018008

key: test_accuracy
value: [0.63636364 0.90909091 0.81818182 0.90909091 0.81818182 0.81818182
 0.72727273 0.90909091 0.72727273 0.8       ]

mean value: 0.8072727272727273

key: train_accuracy
value: [0.95918367 0.93877551 0.96938776 0.94897959 0.94897959 0.95918367
 0.93877551 0.95918367 0.96938776 0.94949495]

mean value: 0.9541331684188827

key: test_fscore
value: [0.         0.85714286 0.66666667 0.85714286 0.75       0.66666667
 0.4        0.85714286 0.4        0.5       ]

mean value: 0.5954761904761905

key: train_fscore
value: [0.93939394 0.90909091 0.95522388 0.92307692 0.92307692 0.93939394
 0.90625    0.93939394 0.95522388 0.92537313]

mean value: 0.9315497468948961

key: test_precision
value: [0.   1.   1.   1.   0.75 1.   1.   1.   1.   1.  ]

mean value: 0.875

key: train_precision
value: [1.         0.96774194 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9967741935483871

key: test_recall
value: [0.         0.75       0.5        0.75       0.75       0.5
 0.25       0.75       0.25       0.33333333]

mean value: 0.48333333333333334

key: train_recall
value: [0.88571429 0.85714286 0.91428571 0.85714286 0.85714286 0.88571429
 0.82857143 0.88571429 0.91428571 0.86111111]

mean value: 0.8746825396825396

key: test_roc_auc
value: [0.5        0.875      0.75       0.875      0.80357143 0.75
 0.625      0.875      0.625      0.66666667]

mean value: 0.7345238095238096

key: train_roc_auc
value: [0.94285714 0.92063492 0.95714286 0.92857143 0.92857143 0.94285714
 0.91428571 0.94285714 0.95714286 0.93055556]

mean value: 0.9365476190476191

key: test_jcc
value: [0.         0.75       0.5        0.75       0.6        0.5
 0.25       0.75       0.25       0.33333333]

mean value: 0.4683333333333333

key: train_jcc
value: [0.88571429 0.83333333 0.91428571 0.85714286 0.85714286 0.88571429
 0.82857143 0.88571429 0.91428571 0.86111111]

mean value: 0.8723015873015872

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02311182 0.00977755 0.00971484 0.00987244 0.00976491 0.00973606
 0.00970364 0.00972319 0.01002288 0.00982809]

mean value: 0.011125540733337403

key: score_time
value: [0.01044178 0.00948119 0.00961089 0.00950241 0.00949287 0.0095408
 0.00925827 0.00945497 0.00956225 0.0095191 ]

mean value: 0.009586453437805176

key: test_mcc
value: [ 0.41833001 -0.03857584 -0.35634832  0.13363062  0.81009259  0.13363062
  0.44854261 -0.35634832 -0.23904572  0.04761905]

mean value: 0.10015272992148226

key: train_mcc
value: [0.58972429 0.61549072 0.62652663 0.56504712 0.54087139 0.5890183
 0.64246907 0.61976141 0.58972429 0.58367897]

mean value: 0.5962312182242981

key: test_accuracy
value: [0.72727273 0.54545455 0.45454545 0.63636364 0.90909091 0.63636364
 0.72727273 0.45454545 0.54545455 0.6       ]

mean value: 0.6236363636363637

key: train_accuracy
value: [0.81632653 0.82653061 0.82653061 0.80612245 0.79591837 0.81632653
 0.83673469 0.82653061 0.81632653 0.80808081]

mean value: 0.8175427746856319

key: test_fscore
value: [0.4        0.28571429 0.         0.33333333 0.85714286 0.33333333
 0.66666667 0.         0.         0.33333333]

mean value: 0.32095238095238093

key: train_fscore
value: [0.7        0.71186441 0.69090909 0.68852459 0.66666667 0.70967742
 0.72413793 0.70175439 0.7        0.66666667]

mean value: 0.6960201157540253

key: test_precision
value: [1.         0.33333333 0.         0.5        1.         0.5
 0.6        0.         0.         0.33333333]

mean value: 0.42666666666666664

key: train_precision
value: [0.84       0.875      0.95       0.80769231 0.8        0.81481481
 0.91304348 0.90909091 0.84       0.9047619 ]

mean value: 0.8654403414620806

key: test_recall
value: [0.25       0.25       0.         0.25       0.75       0.25
 0.75       0.         0.         0.33333333]

mean value: 0.2833333333333333

key: train_recall
value: [0.6        0.6        0.54285714 0.6        0.57142857 0.62857143
 0.6        0.57142857 0.6        0.52777778]

mean value: 0.5842063492063492

key: test_roc_auc
value: [0.625      0.48214286 0.35714286 0.55357143 0.875      0.55357143
 0.73214286 0.35714286 0.42857143 0.52380952]

mean value: 0.5488095238095239

key: train_roc_auc
value: [0.76825397 0.77619048 0.76349206 0.76031746 0.74603175 0.77460317
 0.78412698 0.76984127 0.76825397 0.74801587]

mean value: 0.7659126984126984

key: test_jcc
value: [0.25       0.16666667 0.         0.2        0.75       0.2
 0.5        0.         0.         0.2       ]

mean value: 0.22666666666666668

key: train_jcc
value: [0.53846154 0.55263158 0.52777778 0.525      0.5        0.55
 0.56756757 0.54054054 0.53846154 0.5       ]

mean value: 0.5340440541756332

MCC on Blind test: 0.36

Accuracy on Blind test: 0.7

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.07823944 0.05303431 0.04373765 0.04487467 0.04556298 0.04342461
 0.06485868 0.04147911 0.04427385 0.04476452]

mean value: 0.05042498111724854

key: score_time
value: [0.01064777 0.01035261 0.01131892 0.01060677 0.01130295 0.01109171
 0.01039028 0.01056647 0.01087427 0.01112318]

mean value: 0.01082749366760254

key: test_mcc
value: [0.38575837 0.82807867 1.         1.         1.         0.60714286
 0.81009259 0.82807867 0.81009259 1.        ]

mean value: 0.8269243749071702

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.72727273 0.90909091 1.         1.         1.         0.81818182
 0.90909091 0.90909091 0.90909091 1.        ]

mean value: 0.9181818181818182

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.57142857 0.88888889 1.         1.         1.         0.75
 0.85714286 0.88888889 0.85714286 1.        ]

mean value: 0.8813492063492063

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.8        1.         1.         1.         0.75
 1.         0.8        1.         1.        ]

mean value: 0.9016666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  1.   1.   1.   1.   0.75 0.75 1.   0.75 1.  ]

mean value: 0.875

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.67857143 0.92857143 1.         1.         1.         0.80357143
 0.875      0.92857143 0.875      1.        ]

mean value: 0.9089285714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.4  0.8  1.   1.   1.   0.6  0.75 0.8  0.75 1.  ]

mean value: 0.81

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.03964543 0.0346694  0.04520273 0.07001829 0.05058432 0.05238247
 0.04535532 0.04525828 0.04547024 0.04568648]

mean value: 0.04742729663848877

key: score_time
value: [0.0121448  0.02086496 0.02063799 0.02355075 0.02194834 0.02153063
 0.02106428 0.02080369 0.02024055 0.02344513]

mean value: 0.020623111724853517

key: test_mcc
value: [ 0.38575837 -0.06900656 -0.06900656 -0.17857143  0.3105295   0.21428571
  0.3105295  -0.03857584  0.38575837  0.08908708]

mean value: 0.1340788170211345

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.72727273 0.45454545 0.45454545 0.45454545 0.63636364 0.63636364
 0.63636364 0.54545455 0.72727273 0.5       ]

mean value: 0.5772727272727273

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.57142857 0.4        0.4        0.25       0.6        0.5
 0.6        0.28571429 0.57142857 0.44444444]

mean value: 0.4623015873015873

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.33333333 0.33333333 0.25       0.5        0.5
 0.5        0.33333333 0.66666667 0.33333333]

mean value: 0.44166666666666665

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5        0.5        0.5        0.25       0.75       0.5
 0.75       0.25       0.5        0.66666667]

mean value: 0.5166666666666666

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.67857143 0.46428571 0.46428571 0.41071429 0.66071429 0.60714286
 0.66071429 0.48214286 0.67857143 0.54761905]

mean value: 0.5654761904761905

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.4        0.25       0.25       0.14285714 0.42857143 0.33333333
 0.42857143 0.16666667 0.4        0.28571429]

mean value: 0.30857142857142855

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01844668 0.00899911 0.0086031  0.008533   0.008605   0.00961709
 0.00861883 0.00964332 0.00905418 0.00862718]

mean value: 0.009874749183654784

key: score_time
value: [0.00925779 0.00960422 0.0085597  0.00856972 0.00883079 0.00896478
 0.00923276 0.00896192 0.00885677 0.00844169]

mean value: 0.008928012847900391

key: test_mcc
value: [ 0.62360956  0.44854261 -0.17857143 -0.06900656  0.38575837  0.62360956
  0.38575837 -0.23904572  0.38575837  0.04761905]

mean value: 0.24140322084593716

key: train_mcc
value: [0.35970916 0.34202233 0.45466371 0.47527082 0.40887025 0.34383703
 0.36307678 0.46857566 0.38924947 0.39514539]

mean value: 0.40004206061519487

key: test_accuracy
value: [0.81818182 0.72727273 0.45454545 0.45454545 0.72727273 0.81818182
 0.72727273 0.54545455 0.72727273 0.6       ]

mean value: 0.66

key: train_accuracy
value: [0.70408163 0.69387755 0.75510204 0.76530612 0.73469388 0.70408163
 0.71428571 0.76530612 0.7244898  0.72727273]

mean value: 0.7288497217068646

key: test_fscore
value: [0.66666667 0.66666667 0.25       0.4        0.57142857 0.66666667
 0.57142857 0.         0.57142857 0.33333333]

mean value: 0.46976190476190477

key: train_fscore
value: [0.5915493  0.58333333 0.63636364 0.64615385 0.60606061 0.56716418
 0.57575758 0.62295082 0.59701493 0.59701493]

mean value: 0.6023363142966522

key: test_precision
value: [1.         0.6        0.25       0.33333333 0.66666667 1.
 0.66666667 0.         0.66666667 0.33333333]

mean value: 0.5516666666666666

key: train_precision
value: [0.58333333 0.56756757 0.67741935 0.7        0.64516129 0.59375
 0.61290323 0.73076923 0.625      0.64516129]

mean value: 0.6381065292960454

key: test_recall
value: [0.5        0.75       0.25       0.5        0.5        0.5
 0.5        0.         0.5        0.33333333]

mean value: 0.43333333333333335

key: train_recall
value: [0.6        0.6        0.6        0.6        0.57142857 0.54285714
 0.54285714 0.54285714 0.57142857 0.55555556]

mean value: 0.5726984126984127

key: test_roc_auc
value: [0.75       0.73214286 0.41071429 0.46428571 0.67857143 0.75
 0.67857143 0.42857143 0.67857143 0.52380952]

mean value: 0.6095238095238095

key: train_roc_auc
value: [0.68095238 0.67301587 0.72063492 0.72857143 0.6984127  0.66825397
 0.67619048 0.71587302 0.69047619 0.69047619]

mean value: 0.6942857142857143

key: test_jcc
value: [0.5        0.5        0.14285714 0.25       0.4        0.5
 0.4        0.         0.4        0.2       ]

mean value: 0.3292857142857143

key: train_jcc
value: [0.42       0.41176471 0.46666667 0.47727273 0.43478261 0.39583333
 0.40425532 0.45238095 0.42553191 0.42553191]

mean value: 0.4314020143167855

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0098691  0.01330066 0.01350284 0.01457024 0.01449943 0.01446772
 0.01424551 0.01335931 0.01429367 0.01426768]

mean value: 0.013637614250183106

key: score_time
value: [0.0086205  0.01121116 0.01108789 0.01153731 0.01164913 0.01202822
 0.01170802 0.01162171 0.011621   0.01158166]

mean value: 0.011266660690307618

key: test_mcc
value: [-0.03857584  0.13363062  0.21428571  0.69006556  0.82807867  0.38575837
  0.57142857  0.81009259  0.38575837  0.76376262]

mean value: 0.474428525267057

key: train_mcc
value: [0.95703496 0.78513588 0.93314074 0.93419873 0.95555556 0.95555556
 0.97788036 0.88878629 1.         0.93435698]

mean value: 0.932164505675344

key: test_accuracy
value: [0.54545455 0.63636364 0.63636364 0.81818182 0.90909091 0.72727273
 0.72727273 0.90909091 0.72727273 0.9       ]

mean value: 0.7536363636363637

key: train_accuracy
value: [0.97959184 0.89795918 0.96938776 0.96938776 0.97959184 0.97959184
 0.98979592 0.94897959 1.         0.96969697]

mean value: 0.9683982683982684

key: test_fscore
value: [0.28571429 0.33333333 0.5        0.8        0.88888889 0.57142857
 0.72727273 0.85714286 0.57142857 0.8       ]

mean value: 0.6335209235209236

key: train_fscore/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

value: [0.97222222 0.83333333 0.95652174 0.95522388 0.97142857 0.97142857
 0.98550725 0.92537313 1.         0.95774648]

mean value: 0.9528785177718557

key: test_precision
value: [0.33333333 0.5        0.5        0.66666667 0.8        0.66666667
 0.57142857 1.         0.66666667 1.        ]

mean value: 0.6704761904761904

key: train_precision
value: [0.94594595 1.         0.97058824 1.         0.97142857 0.97142857
 1.         0.96875    1.         0.97142857]

mean value: 0.9799569895525778

key: test_recall
value: [0.25       0.25       0.5        1.         1.         0.5
 1.         0.75       0.5        0.66666667]

mean value: 0.6416666666666666

key: train_recall
value: [1.         0.71428571 0.94285714 0.91428571 0.97142857 0.97142857
 0.97142857 0.88571429 1.         0.94444444]

mean value: 0.9315873015873015

key: test_roc_auc
value: [0.48214286 0.55357143 0.60714286 0.85714286 0.92857143 0.67857143
 0.78571429 0.875      0.67857143 0.83333333]

mean value: 0.7279761904761904

key: train_roc_auc
value: [0.98412698 0.85714286 0.96349206 0.95714286 0.97777778 0.97777778
 0.98571429 0.93492063 1.         0.96428571]

mean value: 0.9602380952380953

key: test_jcc
value: [0.16666667 0.2        0.33333333 0.66666667 0.8        0.4
 0.57142857 0.75       0.4        0.66666667]

mean value: 0.49547619047619046

key: train_jcc
value: [0.94594595 0.71428571 0.91666667 0.91428571 0.94444444 0.94444444
 0.97142857 0.86111111 1.         0.91891892]

mean value: 0.9131531531531532

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01281071 0.01247859 0.01289988 0.01285005 0.01277208 0.01324821
 0.01254749 0.01382518 0.012918   0.01293874]

mean value: 0.0129288911819458

key: score_time
value: [0.01064682 0.01162243 0.01160169 0.01163006 0.01156688 0.01157141
 0.01158237 0.01171994 0.01172638 0.01154923]

mean value: 0.011521720886230468

key: test_mcc
value: [0.         0.13363062 0.38575837 0.35634832 0.69006556 0.3105295
 0.44854261 0.81009259 0.21428571 0.35634832]

mean value: 0.3705601617166881

key: train_mcc
value: [0.72183975 0.84366149 0.84852814 0.4361866  0.6876961  0.68680282
 0.91259839 0.84852814 0.84296756 0.87139937]

mean value: 0.7700208366419465

key: test_accuracy
value: [0.63636364 0.63636364 0.72727273 0.54545455 0.81818182 0.63636364
 0.72727273 0.90909091 0.63636364 0.7       ]

mean value: 0.6972727272727273

key: train_accuracy
value: [0.86734694 0.92857143 0.92857143 0.6122449  0.82653061 0.81632653
 0.95918367 0.92857143 0.91836735 0.93939394]

mean value: 0.8725108225108226

key: test_fscore
value: [0.         0.33333333 0.57142857 0.61538462 0.8        0.6
 0.66666667 0.85714286 0.5        0.57142857]

mean value: 0.5515384615384615

key: train_fscore
value: [0.77192982 0.89855072 0.88888889 0.64814815 0.8        0.79545455
 0.93939394 0.88888889 0.8974359  0.91891892]

mean value: 0.8447609776328312

key: test_precision
value: [0.         0.5        0.66666667 0.44444444 0.66666667 0.5
 0.6        1.         0.5        0.5       ]

mean value: 0.5377777777777778

key: train_precision
value: [1.         0.91176471 1.         0.47945205 0.68       0.66037736
 1.         1.         0.81395349 0.89473684]

mean value: 0.8440284449644796

key: test_recall
value: [0.         0.25       0.5        1.         1.         0.75
 0.75       0.75       0.5        0.66666667]

mean value: 0.6166666666666667

key: train_recall
value: [0.62857143 0.88571429 0.8        1.         0.97142857 1.
 0.88571429 0.8        1.         0.94444444]

mean value: 0.8915873015873016

key: test_roc_auc
value: [0.5        0.55357143 0.67857143 0.64285714 0.85714286 0.66071429
 0.73214286 0.875      0.60714286 0.69047619]

mean value: 0.6797619047619048

key: train_roc_auc
value: [0.81428571 0.91904762 0.9        0.6984127  0.85873016 0.85714286
 0.94285714 0.9        0.93650794 0.94047619]

mean value: 0.8767460317460317

key: test_jcc
value: [0.         0.2        0.4        0.44444444 0.66666667 0.42857143
 0.5        0.75       0.33333333 0.4       ]

mean value: 0.4123015873015873

key: train_jcc
value: [0.62857143 0.81578947 0.8        0.47945205 0.66666667 0.66037736
 0.88571429 0.8        0.81395349 0.85      ]

mean value: 0.7400524756293771

MCC on Blind test: 0.61

Accuracy on Blind test: 0.8

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.09375739 0.08521581 0.08716512 0.08840275 0.09140325 0.08998203
 0.08835912 0.08615708 0.08548617 0.08902049]

mean value: 0.08849492073059081

key: score_time
value: [0.01481247 0.0147016  0.01527953 0.01528096 0.0153563  0.01549649
 0.01509285 0.01497459 0.01454949 0.01461101]

mean value: 0.015015530586242675

key: test_mcc
value: [0.38575837 0.82807867 1.         1.         1.         0.44854261
 0.60714286 0.82807867 0.81009259 0.76376262]

mean value: 0.7671456391169224

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.72727273 0.90909091 1.         1.         1.         0.72727273
 0.81818182 0.90909091 0.90909091 0.9       ]

mean value: 0.89

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.57142857 0.88888889 1.         1.         1.         0.66666667
 0.75       0.88888889 0.85714286 0.8       ]

mean value: 0.8423015873015873

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.8        1.         1.         1.         0.6
 0.75       0.8        1.         1.        ]

mean value: 0.8616666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5        1.         1.         1.         1.         0.75
 0.75       1.         0.75       0.66666667]

mean value: 0.8416666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.67857143 0.92857143 1.         1.         1.         0.73214286
 0.80357143 0.92857143 0.875      0.83333333]

mean value: 0.8779761904761905

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.4        0.8        1.         1.         1.         0.5
 0.6        0.8        0.75       0.66666667]

mean value: 0.7516666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04171443 0.03725171 0.0328362  0.03903294 0.05751514 0.0520432
 0.04804182 0.04658794 0.04486704 0.03777409]

mean value: 0.043766450881958005

key: score_time
value: [0.02665997 0.02416825 0.02867103 0.02760959 0.03714466 0.03697157
 0.03046608 0.02323723 0.01972747 0.03126526]

mean value: 0.02859210968017578

key: test_mcc
value: [0.38575837 0.82807867 1.         1.         1.         0.60714286
 1.         0.82807867 0.81009259 0.76376262]

mean value: 0.8222913777596693

key: train_mcc
value: [0.97788036 1.         1.         1.         1.         1.
 1.         0.95555556 0.97788036 1.        ]

mean value: 0.991131627711635

key: test_accuracy
value: [0.72727273 0.90909091 1.         1.         1.         0.81818182
 1.         0.90909091 0.90909091 0.9       ]

mean value: 0.9172727272727272

key: train_accuracy
value: [0.98979592 1.         1.         1.         1.         1.
 1.         0.97959184 0.98979592 1.        ]

mean value: 0.9959183673469387

key: test_fscore
value: [0.57142857 0.88888889 1.         1.         1.         0.75
 1.         0.88888889 0.85714286 0.8       ]

mean value: 0.8756349206349207

key: train_fscore
value: [0.98550725 1.         1.         1.         1.         1.
 1.         0.97142857 0.98550725 1.        ]

mean value: 0.9942443064182195

key: test_precision
value: [0.66666667 0.8        1.         1.         1.         0.75
 1.         0.8        1.         1.        ]

mean value: 0.9016666666666666

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         0.97142857 1.         1.        ]

mean value: 0.9971428571428571

key: test_recall
value: [0.5        1.         1.         1.         1.         0.75
 1.         1.         0.75       0.66666667]

mean value: 0.8666666666666667

key: train_recall
value: [0.97142857 1.         1.         1.         1.         1.
 1.         0.97142857 0.97142857 1.        ]

mean value: 0.9914285714285714

key: test_roc_auc
value: [0.67857143 0.92857143 1.         1.         1.         0.80357143
 1.         0.92857143 0.875      0.83333333]

mean value: 0.9047619047619048

key: train_roc_auc
value: [0.98571429 1.         1.         1.         1.         1.
 1.         0.97777778 0.98571429 1.        ]

mean value: 0.994920634920635

key: test_jcc
value: [0.4        0.8        1.         1.         1.         0.6
 1.         0.8        0.75       0.66666667]

mean value: 0.8016666666666666

key: train_jcc
value: [0.97142857 1.         1.         1.         1.         1.
 1.         0.94444444 0.97142857 1.        ]

mean value: 0.9887301587301587

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.02844262 0.05032802 0.04449534 0.04859734 0.04044652 0.04043579
 0.04094768 0.04056787 0.04082155 0.04117346]

mean value: 0.04162561893463135

key: score_time
value: [0.02062154 0.02379227 0.02072287 0.02167654 0.02285767 0.02385473
 0.02424192 0.0208075  0.02163959 0.02190232]

mean value: 0.022211694717407228

key: test_mcc
value: [0.62360956 0.06900656 0.13363062 0.13363062 0.41833001 0.21428571
 0.21428571 0.62360956 0.41833001 0.52380952]

mean value: 0.3372527905686335

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.81818182 0.54545455 0.63636364 0.63636364 0.72727273 0.63636364
 0.63636364 0.81818182 0.72727273 0.8       ]

mean value: 0.6981818181818182

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.44444444 0.33333333 0.33333333 0.4        0.5
 0.5        0.66666667 0.4        0.66666667]

mean value: 0.4911111111111111

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.4        0.5        0.5        1.         0.5
 0.5        1.         1.         0.66666667]

mean value: 0.7066666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5        0.5        0.25       0.25       0.25       0.5
 0.5        0.5        0.25       0.66666667]

mean value: 0.4166666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.53571429 0.55357143 0.55357143 0.625      0.60714286
 0.60714286 0.75       0.625      0.76190476]

mean value: 0.6369047619047619

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.28571429 0.2        0.2        0.25       0.33333333
 0.33333333 0.5        0.25       0.5       ]

mean value: 0.3352380952380952

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.61

Accuracy on Blind test: 0.8

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.17393565 0.20081019 0.19978285 0.19986486 0.19875121 0.21868134
 0.20449281 0.22600913 0.20545721 0.20233464]

mean value: 0.20301198959350586

key: score_time
value: [0.00916028 0.0093646  0.00908971 0.00901794 0.00926399 0.00972033
 0.0092566  0.00972772 0.0093832  0.00969172]

mean value: 0.009367609024047851

key: test_mcc
value: [0.38575837 0.82807867 1.         1.         1.         0.60714286
 0.81009259 0.82807867 0.81009259 0.76376262]

mean value: 0.8033006364897676

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.72727273 0.90909091 1.         1.         1.         0.81818182
 0.90909091 0.90909091 0.90909091 0.9       ]

mean value: 0.9081818181818182

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.57142857 0.88888889 1.         1.         1.         0.75
 0.85714286 0.88888889 0.85714286 0.8       ]

mean value: 0.8613492063492063

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.8        1.         1.         1.         0.75
 1.         0.8        1.         1.        ]

mean value: 0.9016666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5        1.         1.         1.         1.         0.75
 0.75       1.         0.75       0.66666667]

mean value: 0.8416666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.67857143 0.92857143 1.         1.         1.         0.80357143
 0.875      0.92857143 0.875      0.83333333]

mean value: 0.8922619047619048

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.4        0.8        1.         1.         1.         0.6
 0.75       0.8        0.75       0.66666667]

mean value: 0.7766666666666666

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01376772 0.01510024 0.01516438 0.0148809  0.01546669 0.01493096
 0.01515436 0.014992   0.01509452 0.01527882]

mean value: 0.014983057975769043

key: score_time
value: [0.01197147 0.01196742 0.01193738 0.01205683 0.0138576  0.01202655
 0.01751709 0.01541352 0.01546621 0.01520991]

mean value: 0.013742399215698243

key: test_mcc
value: [ 0.21428571 -0.03857584 -0.46291005  0.60714286  0.13363062  0.06900656
 -0.06900656  0.06900656 -0.46291005 -0.42857143]

mean value: -0.0368901617515484

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.63636364 0.54545455 0.36363636 0.81818182 0.63636364 0.54545455
 0.45454545 0.54545455 0.36363636 0.4       ]

mean value: 0.5309090909090909

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.28571429 0.         0.75       0.33333333 0.44444444
 0.4        0.44444444 0.         0.        ]

mean value: 0.3157936507936508

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.33333333 0.         0.75       0.5        0.4
 0.33333333 0.4        0.         0.        ]

mean value: 0.32166666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5  0.25 0.   0.75 0.25 0.5  0.5  0.5  0.   0.  ]

mean value: 0.325

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.60714286 0.48214286 0.28571429 0.80357143 0.55357143 0.53571429
 0.46428571 0.53571429 0.28571429 0.28571429]

mean value: 0.48392857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.16666667 0.         0.6        0.2        0.28571429
 0.25       0.28571429 0.         0.        ]

mean value: 0.21214285714285713

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.17

Accuracy on Blind test: 0.4

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02822781 0.01487136 0.0290575  0.033108   0.01291656 0.01286721
 0.03267503 0.03277683 0.03245783 0.03252196]

mean value: 0.026148009300231933

key: score_time
value: [0.01211548 0.01180744 0.0202291  0.01994181 0.01185703 0.0116508
 0.02012253 0.02056456 0.02143598 0.02186871]

mean value: 0.017159342765808105

key: test_mcc
value: [0.38575837 0.3105295  0.38575837 0.44854261 0.82807867 0.44854261
 0.82807867 0.60714286 0.38575837 0.52380952]

mean value: 0.515199957693884

key: train_mcc
value: [0.97788036 0.93314074 0.93314074 0.93314074 0.95555556 0.93314074
 0.97788036 0.93314074 1.         0.93435698]

mean value: 0.9511376935583599

key: test_accuracy
value: [0.72727273 0.63636364 0.72727273 0.72727273 0.90909091 0.72727273
 0.90909091 0.81818182 0.72727273 0.8       ]

mean value: 0.7709090909090909

key: train_accuracy
value: [0.98979592 0.96938776 0.96938776 0.96938776 0.97959184 0.96938776
 0.98979592 0.96938776 1.         0.96969697]

mean value: 0.9775819418676561

key: test_fscore
value: [0.57142857 0.6        0.57142857 0.66666667 0.88888889 0.66666667
 0.88888889 0.75       0.57142857 0.66666667]

mean value: 0.6842063492063493

key: train_fscore
value: [0.98550725 0.95652174 0.95652174 0.95652174 0.97142857 0.95652174
 0.98550725 0.95652174 1.         0.95774648]

mean value: 0.9682798238707608

key: test_precision
value: [0.66666667 0.5        0.66666667 0.6        0.8        0.6
 0.8        0.75       0.66666667 0.66666667]

mean value: 0.6716666666666666

key: train_precision
value: [1.         0.97058824 0.97058824 0.97058824 0.97142857 0.97058824
 1.         0.97058824 1.         0.97142857]

mean value: 0.9795798319327731

key: test_recall
value: [0.5        0.75       0.5        0.75       1.         0.75
 1.         0.75       0.5        0.66666667]

mean value: 0.7166666666666667

key: train_recall
value: [0.97142857 0.94285714 0.94285714 0.94285714 0.97142857 0.94285714
 0.97142857 0.94285714 1.         0.94444444]

mean value: 0.9573015873015873

key: test_roc_auc
value: [0.67857143 0.66071429 0.67857143 0.73214286 0.92857143 0.73214286
 0.92857143 0.80357143 0.67857143 0.76190476]

mean value: 0.7583333333333333

key: train_roc_auc
value: [0.98571429 0.96349206 0.96349206 0.96349206 0.97777778 0.96349206
 0.98571429 0.96349206 1.         0.96428571]

mean value: 0.9730952380952382

key: test_jcc
value: [0.4        0.42857143 0.4        0.5        0.8        0.5
 0.8        0.6        0.4        0.5       ]

mean value: 0.5328571428571429

key: train_jcc
value: [0.97142857 0.91666667 0.91666667 0.91666667 0.94444444 0.91666667
 0.97142857 0.91666667 1.         0.91891892]

mean value: 0.9389553839553839

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.16137671 0.18871665 0.19555187 0.18758488 0.1897881  0.20349956
 0.2113049  0.19966245 0.18951869 0.24313569]

mean value: 0.1970139503479004

key: score_time
value: [0.02137899 0.02143884 0.02272177 0.01185441 0.02316952 0.02063107
 0.02193546 0.02300262 0.02248096 0.02047634]

mean value: 0.0209089994430542

key: test_mcc
value: [0.38575837 0.3105295  0.38575837 0.44854261 0.82807867 0.44854261
 0.21428571 0.60714286 0.38575837 0.52380952]

mean value: 0.45382066200137294

key: train_mcc
value: [0.97788036 0.93314074 0.93314074 0.93314074 0.95555556 0.93314074
 0.78513588 0.93314074 1.         0.93435698]

mean value: 0.9318632458688522

key: test_accuracy
value: [0.72727273 0.63636364 0.72727273 0.72727273 0.90909091 0.72727273
 0.63636364 0.81818182 0.72727273 0.8       ]

mean value: 0.7436363636363637

key: train_accuracy
value: [0.98979592 0.96938776 0.96938776 0.96938776 0.97959184 0.96938776
 0.89795918 0.96938776 1.         0.96969697]

mean value: 0.9683982683982684

key: test_fscore
value: [0.57142857 0.6        0.57142857 0.66666667 0.88888889 0.66666667
 0.5        0.75       0.57142857 0.66666667]

mean value: 0.6453174603174603

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:107: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:110: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98550725 0.95652174 0.95652174 0.95652174 0.97142857 0.95652174
 0.83333333 0.95652174 1.         0.95774648]

mean value: 0.953062432566413

key: test_precision
value: [0.66666667 0.5        0.66666667 0.6        0.8        0.6
 0.5        0.75       0.66666667 0.66666667]

mean value: 0.6416666666666666

key: train_precision
value: [1.         0.97058824 0.97058824 0.97058824 0.97142857 0.97058824
 1.         0.97058824 1.         0.97142857]

mean value: 0.9795798319327731

key: test_recall
value: [0.5        0.75       0.5        0.75       1.         0.75
 0.5        0.75       0.5        0.66666667]

mean value: 0.6666666666666666

key: train_recall
value: [0.97142857 0.94285714 0.94285714 0.94285714 0.97142857 0.94285714
 0.71428571 0.94285714 1.         0.94444444]

mean value: 0.9315873015873015

key: test_roc_auc
value: [0.67857143 0.66071429 0.67857143 0.73214286 0.92857143 0.73214286
 0.60714286 0.80357143 0.67857143 0.76190476]

mean value: 0.7261904761904762

key: train_roc_auc
value: [0.98571429 0.96349206 0.96349206 0.96349206 0.97777778 0.96349206
 0.85714286 0.96349206 1.         0.96428571]

mean value: 0.9602380952380953

key: test_jcc
value: [0.4        0.42857143 0.4        0.5        0.8        0.5
 0.33333333 0.6        0.4        0.5       ]

mean value: 0.4861904761904762

key: train_jcc
value: [0.97142857 0.91666667 0.91666667 0.91666667 0.94444444 0.91666667
 0.71428571 0.91666667 1.         0.91891892]

mean value: 0.9132410982410982

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02637744 0.0271399  0.01950192 0.02762055 0.02170396 0.02873182
 0.02581763 0.02583599 0.02582002 0.0221951 ]

mean value: 0.02507443428039551

key: score_time
value: [0.01182771 0.01178241 0.01171708 0.01184201 0.01175046 0.01182175
 0.01181841 0.01186657 0.01179719 0.01182365]

mean value: 0.011804723739624023

key: test_mcc
value: [0.74535599 0.57735027 0.74535599 0.74535599 0.71428571 0.8660254
 0.63245553 0.52223297 0.57735027 0.4472136 ]

mean value: 0.6572981729349922

key: train_mcc
value: [0.92075092 0.90521816 0.88900089 0.88989842 0.88989842 0.90659109
 0.90521816 0.87345612 0.92075092 0.87478088]

mean value: 0.8975563989339166

key: test_accuracy
value: [0.85714286 0.78571429 0.85714286 0.85714286 0.85714286 0.92857143
 0.78571429 0.71428571 0.78571429 0.71428571]

mean value: 0.8142857142857143

key: train_accuracy
value: [0.96031746 0.95238095 0.94444444 0.94444444 0.94444444 0.95238095
 0.95238095 0.93650794 0.96031746 0.93650794]

mean value: 0.9484126984126984

key: test_fscore
value: [0.83333333 0.8        0.875      0.875      0.85714286 0.93333333
 0.82352941 0.6        0.76923077 0.66666667]

mean value: 0.8033236371471666

key: train_fscore
value: [0.96       0.9516129  0.944      0.94308943 0.94308943 0.95081967
 0.9516129  0.93548387 0.96       0.93442623]

mean value: 0.9474134440847317

key: test_precision
value: [1.         0.75       0.77777778 0.77777778 0.85714286 0.875
 0.7        1.         0.83333333 0.8       ]

mean value: 0.8371031746031746

key: train_precision
value: [0.96774194 0.96721311 0.9516129  0.96666667 0.96666667 0.98305085
 0.96721311 0.95081967 0.96774194 0.96610169]

mean value: 0.9654828551539107

key: test_recall
value: [0.71428571 0.85714286 1.         1.         0.85714286 1.
 1.         0.42857143 0.71428571 0.57142857]

mean value: 0.8142857142857143

key: train_recall
value: [0.95238095 0.93650794 0.93650794 0.92063492 0.92063492 0.92063492
 0.93650794 0.92063492 0.95238095 0.9047619 ]

mean value: 0.9301587301587302

key: test_roc_auc
value: [0.85714286 0.78571429 0.85714286 0.85714286 0.85714286 0.92857143
 0.78571429 0.71428571 0.78571429 0.71428571]

mean value: 0.8142857142857144

key: train_roc_auc
value: [0.96031746 0.95238095 0.94444444 0.94444444 0.94444444 0.95238095
 0.95238095 0.93650794 0.96031746 0.93650794]

mean value: 0.9484126984126984

key: test_jcc
value: [0.71428571 0.66666667 0.77777778 0.77777778 0.75       0.875
 0.7        0.42857143 0.625      0.5       ]

mean value: 0.6815079365079365

key: train_jcc
value: [0.92307692 0.90769231 0.89393939 0.89230769 0.89230769 0.90625
 0.90769231 0.87878788 0.92307692 0.87692308]

mean value: 0.9002054195804196

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.64419675 0.75507832 0.65785503 0.85170484 0.89698648 0.7391665
 0.64110851 0.8319416  0.65485907 0.68134093]

mean value: 0.7354238033294678

key: score_time
value: [0.01294661 0.02320194 0.0123992  0.02117777 0.01676965 0.01436758
 0.01205087 0.01219463 0.01197934 0.01209545]

mean value: 0.014918303489685059

key: test_mcc
value: [0.71428571 0.4472136  0.74535599 0.74535599 0.71428571 0.74535599
 0.63245553 0.74535599 0.74535599 0.57735027]

mean value: 0.6812370787794337

key: train_mcc
value: [1.         1.         0.96825397 1.         1.         1.
 0.98425098 0.95250095 0.96825397 0.96825397]

mean value: 0.984151384151481

key: test_accuracy
value: [0.85714286 0.71428571 0.85714286 0.85714286 0.85714286 0.85714286
 0.78571429 0.85714286 0.85714286 0.78571429]

mean value: 0.8285714285714285

key: train_accuracy
value: [1.         1.         0.98412698 1.         1.         1.
 0.99206349 0.97619048 0.98412698 0.98412698]

mean value: 0.9920634920634921

key: test_fscore
value: [0.85714286 0.75       0.875      0.875      0.85714286 0.875
 0.82352941 0.83333333 0.83333333 0.76923077]

mean value: 0.8348712561947856

key: train_fscore
value: [1.         1.         0.98412698 1.         1.         1.
 0.992      0.976      0.98412698 0.98412698]

mean value: 0.9920380952380952

key: test_precision
value: [0.85714286 0.66666667 0.77777778 0.77777778 0.85714286 0.77777778
 0.7        1.         1.         0.83333333]

mean value: 0.8247619047619048

key: train_precision
value: [1.         1.         0.98412698 1.         1.         1.
 1.         0.98387097 0.98412698 0.98412698]

mean value: 0.9936251920122887

key: test_recall
value: [0.85714286 0.85714286 1.         1.         0.85714286 1.
 1.         0.71428571 0.71428571 0.71428571]

mean value: 0.8714285714285714

key: train_recall
value: [1.         1.         0.98412698 1.         1.         1.
 0.98412698 0.96825397 0.98412698 0.98412698]

mean value: 0.9904761904761905

key: test_roc_auc
value: [0.85714286 0.71428571 0.85714286 0.85714286 0.85714286 0.85714286
 0.78571429 0.85714286 0.85714286 0.78571429]

mean value: 0.8285714285714286

key: train_roc_auc
value: [1.         1.         0.98412698 1.         1.         1.
 0.99206349 0.97619048 0.98412698 0.98412698]

mean value: 0.9920634920634921

key: test_jcc
value: [0.75       0.6        0.77777778 0.77777778 0.75       0.77777778
 0.7        0.71428571 0.71428571 0.625     ]

mean value: 0.7186904761904762

key: train_jcc
value: [1.         1.         0.96875    1.         1.         1.
 0.98412698 0.953125   0.96875    0.96875   ]

mean value: 0.9843501984126984

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01220131 0.01074219 0.00907183 0.0087285  0.00857234 0.008811
 0.00867271 0.00884128 0.00853682 0.00868058]

mean value: 0.009285855293273925

key: score_time
value: [0.01339674 0.00990987 0.0088923  0.0088439  0.00893879 0.00867295
 0.0085187  0.00884485 0.00906515 0.00866961]

mean value: 0.009375286102294923

key: test_mcc
value: [ 0.         -0.17407766  0.4472136   0.31622777  0.4472136   0.40824829
  0.40824829  0.63245553  0.28867513 -0.28867513]

mean value: 0.24855294140224576

key: train_mcc
value: [0.50468255 0.40406102 0.50124029 0.55555556 0.43956222 0.44905021
 0.4662283  0.53975054 0.37745516 0.45421998]

mean value: 0.4691805826659309

key: test_accuracy
value: [0.5        0.42857143 0.71428571 0.64285714 0.71428571 0.64285714
 0.64285714 0.78571429 0.64285714 0.35714286]

mean value: 0.6071428571428572

key: train_accuracy
value: [0.74603175 0.69047619 0.74603175 0.77777778 0.71428571 0.72222222
 0.73015873 0.76984127 0.68253968 0.72222222]

mean value: 0.7301587301587301

key: test_fscore
value: [0.53333333 0.55555556 0.75       0.70588235 0.75       0.73684211
 0.73684211 0.72727273 0.66666667 0.4       ]

mean value: 0.6562394846295775

key: train_fscore
value: [0.77142857 0.73469388 0.76811594 0.77777778 0.74285714 0.74074074
 0.75       0.77165354 0.71830986 0.74820144]

mean value: 0.7523778893695175

key: test_precision
value: [0.5        0.45454545 0.66666667 0.6        0.66666667 0.58333333
 0.58333333 1.         0.625      0.375     ]

mean value: 0.6054545454545455

key: train_precision
value: [0.7012987  0.64285714 0.70666667 0.77777778 0.67532468 0.69444444
 0.69863014 0.765625   0.64556962 0.68421053]

mean value: 0.6992404691924664

key: test_recall
value: [0.57142857 0.71428571 0.85714286 0.85714286 0.85714286 1.
 1.         0.57142857 0.71428571 0.42857143]

mean value: 0.7571428571428571

key: train_recall
value: [0.85714286 0.85714286 0.84126984 0.77777778 0.82539683 0.79365079
 0.80952381 0.77777778 0.80952381 0.82539683]

mean value: 0.8174603174603174

key: test_roc_auc
value: [0.5        0.42857143 0.71428571 0.64285714 0.71428571 0.64285714
 0.64285714 0.78571429 0.64285714 0.35714286]

mean value: 0.6071428571428571

key: train_roc_auc
value: [0.74603175 0.69047619 0.74603175 0.77777778 0.71428571 0.72222222
 0.73015873 0.76984127 0.68253968 0.72222222]

mean value: 0.7301587301587301

key: test_jcc
value: [0.36363636 0.38461538 0.6        0.54545455 0.6        0.58333333
 0.58333333 0.57142857 0.5        0.25      ]

mean value: 0.4981801531801532

key: train_jcc
value: [0.62790698 0.58064516 0.62352941 0.63636364 0.59090909 0.58823529
 0.6        0.62820513 0.56043956 0.59770115]

mean value: 0.6033935409259565

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00905013 0.00918555 0.00909519 0.00984001 0.00886965 0.00883627
 0.00905323 0.00987196 0.00902081 0.00882959]

mean value: 0.009165239334106446

key: score_time
value: [0.00879502 0.008955   0.00891137 0.00917125 0.00870037 0.00867844
 0.00872588 0.00947595 0.0086875  0.0088346 ]

mean value: 0.008893537521362304

key: test_mcc
value: [0.28867513 0.1490712  0.57735027 0.52223297 0.57735027 0.31622777
 0.17407766 0.71428571 0.4472136  0.1490712 ]

mean value: 0.39155557695993376

key: train_mcc
value: [0.51346746 0.55566238 0.54156609 0.50874391 0.50874391 0.52297636
 0.50874391 0.53724272 0.58399712 0.51890567]

mean value: 0.5300049520306789

key: test_accuracy
value: [0.64285714 0.57142857 0.78571429 0.71428571 0.78571429 0.64285714
 0.57142857 0.85714286 0.71428571 0.57142857]

mean value: 0.6857142857142857

key: train_accuracy
value: [0.74603175 0.76984127 0.76190476 0.74603175 0.74603175 0.75396825
 0.74603175 0.76190476 0.78571429 0.74603175]

mean value: 0.7563492063492063

key: test_fscore
value: [0.66666667 0.625      0.8        0.77777778 0.8        0.70588235
 0.66666667 0.85714286 0.75       0.625     ]

mean value: 0.7274136321195145

key: train_fscore
value: [0.77777778 0.79432624 0.78873239 0.77464789 0.77464789 0.78014184
 0.77464789 0.78571429 0.8057554  0.78082192]

mean value: 0.7837213518428147

key: test_precision
value: [0.625      0.55555556 0.75       0.63636364 0.75       0.6
 0.54545455 0.85714286 0.66666667 0.55555556]

mean value: 0.6541738816738817

key: train_precision
value: [0.69135802 0.71794872 0.70886076 0.69620253 0.69620253 0.70512821
 0.69620253 0.71428571 0.73684211 0.68674699]

mean value: 0.7049778109699341

key: test_recall
value: [0.71428571 0.71428571 0.85714286 1.         0.85714286 0.85714286
 0.85714286 0.85714286 0.85714286 0.71428571]

mean value: 0.8285714285714285

key: train_recall
value: [0.88888889 0.88888889 0.88888889 0.87301587 0.87301587 0.87301587
 0.87301587 0.87301587 0.88888889 0.9047619 ]

mean value: 0.8825396825396825

key: test_roc_auc
value: [0.64285714 0.57142857 0.78571429 0.71428571 0.78571429 0.64285714
 0.57142857 0.85714286 0.71428571 0.57142857]

mean value: 0.6857142857142857

key: train_roc_auc
value: [0.74603175 0.76984127 0.76190476 0.74603175 0.74603175 0.75396825
 0.74603175 0.76190476 0.78571429 0.74603175]

mean value: 0.7563492063492063

key: test_jcc
value: [0.5        0.45454545 0.66666667 0.63636364 0.66666667 0.54545455
 0.5        0.75       0.6        0.45454545]

mean value: 0.5774242424242424

key: train_jcc
value: [0.63636364 0.65882353 0.65116279 0.63218391 0.63218391 0.63953488
 0.63218391 0.64705882 0.6746988  0.64044944]

mean value: 0.6444643621244319

MCC on Blind test: 0.0

Accuracy on Blind test: 0.5

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00863433 0.00944114 0.00866365 0.00854492 0.00874066 0.00937176
 0.00906944 0.00859618 0.00928903 0.00939155]

mean value: 0.008974266052246094

key: score_time
value: [0.01479411 0.01020455 0.00980353 0.00954485 0.01014256 0.00992489
 0.01043534 0.00978708 0.01040745 0.01025319]

mean value: 0.010529756546020508

key: test_mcc
value: [0.17407766 0.4472136  0.8660254  0.         0.42857143 0.4472136
 0.52223297 0.71428571 0.4472136  0.14285714]

mean value: 0.41896910998213893

key: train_mcc
value: [0.62699668 0.57438828 0.59484301 0.68565632 0.63059263 0.6415003
 0.67082039 0.65915036 0.69526879 0.58834841]

mean value: 0.6367565156928385

key: test_accuracy
value: [0.57142857 0.71428571 0.92857143 0.5        0.71428571 0.71428571
 0.71428571 0.85714286 0.71428571 0.57142857]

mean value: 0.7

key: train_accuracy
value: [0.80952381 0.77777778 0.79365079 0.84126984 0.80952381 0.81746032
 0.83333333 0.82539683 0.84126984 0.78571429]

mean value: 0.8134920634920635

key: test_fscore
value: [0.66666667 0.75       0.93333333 0.58823529 0.71428571 0.75
 0.77777778 0.85714286 0.75       0.57142857]

mean value: 0.7358870214752568

key: train_fscore
value: [0.82352941 0.8028169  0.80882353 0.84848485 0.82608696 0.82962963
 0.84210526 0.83823529 0.85507246 0.80851064]

mean value: 0.8283294936562668

key: test_precision
value: [0.54545455 0.66666667 0.875      0.5        0.71428571 0.66666667
 0.63636364 0.85714286 0.66666667 0.57142857]

mean value: 0.6699675324675325

key: train_precision
value: [0.76712329 0.72151899 0.75342466 0.8115942  0.76       0.77777778
 0.8        0.78082192 0.78666667 0.73076923]

mean value: 0.7689696728467696

key: test_recall
value: [0.85714286 0.85714286 1.         0.71428571 0.71428571 0.85714286
 1.         0.85714286 0.85714286 0.57142857]

mean value: 0.8285714285714285

key: train_recall
value: [0.88888889 0.9047619  0.87301587 0.88888889 0.9047619  0.88888889
 0.88888889 0.9047619  0.93650794 0.9047619 ]

mean value: 0.8984126984126984

key: test_roc_auc
value: [0.57142857 0.71428571 0.92857143 0.5        0.71428571 0.71428571
 0.71428571 0.85714286 0.71428571 0.57142857]

mean value: 0.7

key: train_roc_auc
value: [0.80952381 0.77777778 0.79365079 0.84126984 0.80952381 0.81746032
 0.83333333 0.82539683 0.84126984 0.78571429]

mean value: 0.8134920634920635

key: test_jcc
value: [0.5        0.6        0.875      0.41666667 0.55555556 0.6
 0.63636364 0.75       0.6        0.4       ]

mean value: 0.5933585858585858

key: train_jcc
value: [0.7        0.67058824 0.67901235 0.73684211 0.7037037  0.70886076
 0.72727273 0.72151899 0.74683544 0.67857143]

mean value: 0.7073205735657565

MCC on Blind test: 0.25

Accuracy on Blind test: 0.6

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01068568 0.01109004 0.0109911  0.01106501 0.0109973  0.01087308
 0.00957727 0.01096416 0.01100779 0.01103806]

mean value: 0.010828948020935059

key: score_time
value: [0.00986814 0.00978184 0.00979042 0.00876927 0.00898647 0.00972056
 0.00877142 0.00978684 0.00980353 0.00994635]

mean value: 0.009522485733032226

key: test_mcc
value: [0.74535599 0.57735027 0.31622777 0.71428571 0.4472136  0.71428571
 0.74535599 0.31622777 0.42857143 0.        ]

mean value: 0.5004874238865976

key: train_mcc
value: [0.92075092 0.93650794 0.85811633 0.88989842 0.85985517 0.88989842
 0.9369802  0.88900089 0.88900089 0.87301587]

mean value: 0.89430250462966

key: test_accuracy
value: [0.85714286 0.78571429 0.64285714 0.85714286 0.71428571 0.85714286
 0.85714286 0.64285714 0.71428571 0.5       ]

mean value: 0.7428571428571429

key: train_accuracy
value: [0.96031746 0.96825397 0.92857143 0.94444444 0.92857143 0.94444444
 0.96825397 0.94444444 0.94444444 0.93650794]

mean value: 0.9468253968253968

key: test_fscore
value: [0.83333333 0.8        0.70588235 0.85714286 0.66666667 0.85714286
 0.875      0.54545455 0.71428571 0.53333333]

mean value: 0.7388241660300483

key: train_fscore
value: [0.96       0.96825397 0.92682927 0.94308943 0.92561983 0.94308943
 0.96774194 0.944      0.944      0.93650794]

mean value: 0.945913180503782

key: test_precision
value: [1.         0.75       0.6        0.85714286 0.8        0.85714286
 0.77777778 0.75       0.71428571 0.5       ]

mean value: 0.7606349206349207

key: train_precision
value: [0.96774194 0.96825397 0.95       0.96666667 0.96551724 0.96666667
 0.98360656 0.9516129  0.9516129  0.93650794]

mean value: 0.9608186778787081

key: test_recall
value: [0.71428571 0.85714286 0.85714286 0.85714286 0.57142857 0.85714286
 1.         0.42857143 0.71428571 0.57142857]

mean value: 0.7428571428571429

key: train_recall
value: [0.95238095 0.96825397 0.9047619  0.92063492 0.88888889 0.92063492
 0.95238095 0.93650794 0.93650794 0.93650794]

mean value: 0.9317460317460318

key: test_roc_auc
value: [0.85714286 0.78571429 0.64285714 0.85714286 0.71428571 0.85714286
 0.85714286 0.64285714 0.71428571 0.5       ]

mean value: 0.7428571428571429

key: train_roc_auc
value: [0.96031746 0.96825397 0.92857143 0.94444444 0.92857143 0.94444444
 0.96825397 0.94444444 0.94444444 0.93650794]

mean value: 0.9468253968253968

key: test_jcc
value: [0.71428571 0.66666667 0.54545455 0.75       0.5        0.75
 0.77777778 0.375      0.55555556 0.36363636]

mean value: 0.5998376623376623

key: train_jcc
value: [0.92307692 0.93846154 0.86363636 0.89230769 0.86153846 0.89230769
 0.9375     0.89393939 0.89393939 0.88059701]

mean value: 0.8977304474132832

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.65259933 0.55132651 0.52340722 0.52382731 0.71559811 0.55424976
 0.4957695  0.56733274 0.7447772  0.56227398]

mean value: 0.5891161680221557

key: score_time
value: [0.01228452 0.01230955 0.01233864 0.01230621 0.0232439  0.01236844
 0.01233506 0.01241755 0.01222134 0.0124197 ]

mean value: 0.013424491882324219

key: test_mcc
value: [0.57735027 0.1490712  0.74535599 0.74535599 0.71428571 0.4472136
 0.52223297 0.8660254  0.74535599 0.57735027]

mean value: 0.6089597395816232

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78571429 0.57142857 0.85714286 0.85714286 0.85714286 0.71428571
 0.71428571 0.92857143 0.85714286 0.78571429]

mean value: 0.7928571428571428

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        0.625      0.875      0.875      0.85714286 0.75
 0.77777778 0.92307692 0.83333333 0.76923077]

mean value: 0.8085561660561661

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.55555556 0.77777778 0.77777778 0.85714286 0.66666667
 0.63636364 1.         1.         0.83333333]

mean value: 0.7854617604617604

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.85714286 0.71428571 1.         1.         0.85714286 0.85714286
 1.         0.85714286 0.71428571 0.71428571]

mean value: 0.8571428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78571429 0.57142857 0.85714286 0.85714286 0.85714286 0.71428571
 0.71428571 0.92857143 0.85714286 0.78571429]

mean value: 0.7928571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 0.45454545 0.77777778 0.77777778 0.75       0.6
 0.63636364 0.85714286 0.71428571 0.625     ]

mean value: 0.6859559884559885

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01516175 0.01404548 0.01218319 0.0120542  0.01173067 0.01125073
 0.01186585 0.0115962  0.01139712 0.01205778]

mean value: 0.012334299087524415

key: score_time
value: [0.01173472 0.009022   0.00886035 0.00956607 0.00870371 0.00868201
 0.00949168 0.00859475 0.00865388 0.00905395]

mean value: 0.009236311912536621

key: test_mcc
value: [1.         0.71428571 1.         0.71428571 1.         0.71428571
 0.74535599 0.71428571 0.74535599 0.8660254 ]

mean value: 0.8213880245927155

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 0.85714286 0.85714286 0.85714286 0.92857143]

mean value: 0.9071428571428571

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 0.875      0.85714286 0.83333333 0.93333333]

mean value: 0.9070238095238095

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 0.77777778 0.85714286 1.         0.875     ]

mean value: 0.9081349206349206

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 1.         0.85714286 0.71428571 1.        ]

mean value: 0.9142857142857143

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 0.85714286 0.85714286 0.85714286 0.92857143]

mean value: 0.9071428571428571

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.75       1.         0.75       1.         0.75
 0.77777778 0.75       0.71428571 0.875     ]

mean value: 0.8367063492063492

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.67

Accuracy on Blind test: 0.8

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08879137 0.08702278 0.08695221 0.0866487  0.08714247 0.08788347
 0.08967614 0.08844662 0.09147382 0.09041929]

mean value: 0.08844568729400634

key: score_time
value: [0.01770091 0.01726842 0.01739073 0.017308   0.01802349 0.0170784
 0.01703453 0.01725245 0.01726747 0.01859736]

mean value: 0.017492175102233887

key: test_mcc
value: [0.74535599 0.57735027 0.74535599 0.4472136  0.57735027 0.74535599
 0.31622777 0.74535599 0.8660254  0.42857143]

mean value: 0.6194162702251634

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.78571429 0.85714286 0.71428571 0.78571429 0.85714286
 0.64285714 0.85714286 0.92857143 0.71428571]

mean value: 0.8

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.8        0.875      0.75       0.76923077 0.875
 0.70588235 0.83333333 0.92307692 0.71428571]

mean value: 0.8079142426201249

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.75       0.77777778 0.66666667 0.83333333 0.77777778
 0.6        1.         1.         0.71428571]

mean value: 0.811984126984127

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 0.71428571 1.
 0.85714286 0.71428571 0.85714286 0.71428571]

mean value: 0.8285714285714285

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.78571429 0.85714286 0.71428571 0.78571429 0.85714286
 0.64285714 0.85714286 0.92857143 0.71428571]

mean value: 0.8

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.66666667 0.77777778 0.6        0.625      0.77777778
 0.54545455 0.71428571 0.85714286 0.55555556]

mean value: 0.6833946608946608

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00893784 0.00919008 0.00887132 0.0088284  0.00957584 0.00873303
 0.00881243 0.00870514 0.00895    0.00870538]

mean value: 0.00893094539642334

key: score_time
value: [0.00892377 0.0087471  0.00869346 0.00878525 0.00857949 0.00855255
 0.00867772 0.00869751 0.00853705 0.00853515]

mean value: 0.00867290496826172

key: test_mcc
value: [0.4472136  0.         0.31622777 0.57735027 0.28867513 0.
 0.28867513 0.1490712  0.8660254  0.63245553]

mean value: 0.3565694034214148

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71428571 0.5        0.64285714 0.78571429 0.64285714 0.5
 0.64285714 0.57142857 0.92857143 0.78571429]

mean value: 0.6714285714285715

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.53333333 0.70588235 0.8        0.66666667 0.36363636
 0.61538462 0.625      0.93333333 0.72727273]

mean value: 0.6720509392568216

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.5        0.6        0.75       0.625      0.5
 0.66666667 0.55555556 0.875      1.        ]

mean value: 0.6738888888888889

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.85714286 0.57142857 0.85714286 0.85714286 0.71428571 0.28571429
 0.57142857 0.71428571 1.         0.57142857]

mean value: 0.7

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71428571 0.5        0.64285714 0.78571429 0.64285714 0.5
 0.64285714 0.57142857 0.92857143 0.78571429]

mean value: 0.6714285714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.36363636 0.54545455 0.66666667 0.5        0.22222222
 0.44444444 0.45454545 0.875      0.57142857]

mean value: 0.5243398268398268

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.53

Accuracy on Blind test: 0.7

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.13994312 1.12470388 1.12413239 1.10539031 1.11974645 1.11329842
 1.13287234 1.18959332 1.16923952 1.10620642]

mean value: 1.132512617111206

key: score_time
value: [0.09226322 0.09364581 0.08707714 0.08611917 0.08741522 0.08654284
 0.09422922 0.09448171 0.08700848 0.09361315]

mean value: 0.09023959636688232

key: test_mcc
value: [0.63245553 0.57735027 0.8660254  0.8660254  0.71428571 0.74535599
 0.31622777 0.8660254  0.8660254  0.57735027]

mean value: 0.7027127158353165

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78571429 0.78571429 0.92857143 0.92857143 0.85714286 0.85714286
 0.64285714 0.92857143 0.92857143 0.78571429]

mean value: 0.8428571428571429

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.72727273 0.8        0.92307692 0.93333333 0.85714286 0.875
 0.70588235 0.92307692 0.92307692 0.76923077]

mean value: 0.8437092809151633

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.75       1.         0.875      0.85714286 0.77777778
 0.6        1.         1.         0.83333333]

mean value: 0.8693253968253968

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.57142857 0.85714286 0.85714286 1.         0.85714286 1.
 0.85714286 0.85714286 0.85714286 0.71428571]

mean value: 0.8428571428571429

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78571429 0.78571429 0.92857143 0.92857143 0.85714286 0.85714286
 0.64285714 0.92857143 0.92857143 0.78571429]

mean value: 0.8428571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[0.57142857 0.66666667 0.85714286 0.875      0.75       0.77777778
 0.54545455 0.85714286 0.85714286 0.625     ]

mean value: 0.7382756132756132

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.82786345 0.85628772 0.86232376 0.87840986 0.8767724  0.84112525
 0.86769676 0.88985515 0.93869185 0.93964982]

mean value: 0.877867603302002

key: score_time
value: [0.21467733 0.18964672 0.21193528 0.17583776 0.15622592 0.23833346
 0.22759962 0.23133421 0.20800018 0.16814971]

mean value: 0.2021740198135376

key: test_mcc
value: [0.74535599 0.57735027 0.8660254  0.74535599 0.71428571 0.74535599
 0.4472136  0.8660254  0.8660254  0.63245553]

mean value: 0.720544929986208

key: train_mcc
value: [0.93650794 0.95250095 0.95250095 0.95250095 0.95346259 0.9216805
 0.95250095 0.98425098 0.96825397 0.9216805 ]

mean value: 0.94958402941395

key: test_accuracy
value: [0.85714286 0.78571429 0.92857143 0.85714286 0.85714286 0.85714286
 0.71428571 0.92857143 0.92857143 0.78571429]

mean value: 0.85

key: train_accuracy
value: [0.96825397 0.97619048 0.97619048 0.97619048 0.97619048 0.96031746
 0.97619048 0.99206349 0.98412698 0.96031746]

mean value: 0.9746031746031746

key: test_fscore
value: [0.83333333 0.8        0.92307692 0.875      0.85714286 0.875
 0.75       0.92307692 0.92307692 0.72727273]

mean value: 0.8486979686979687

key: train_fscore
value: [0.96825397 0.976      0.976      0.976      0.97560976 0.95934959
 0.976      0.992      0.98412698 0.95934959]

mean value: 0.9742689895470383

key: test_precision
value: [1.         0.75       1.         0.77777778 0.85714286 0.77777778
 0.66666667 1.         1.         1.        ]

mean value: 0.8829365079365079

key: train_precision
value: [0.96825397 0.98387097 0.98387097 0.98387097 1.         0.98333333
 0.98387097 1.         0.98412698 0.98333333]

mean value: 0.9854531490015361

key: test_recall
value: [0.71428571 0.85714286 0.85714286 1.         0.85714286 1.
 0.85714286 0.85714286 0.85714286 0.57142857]

mean value: 0.8428571428571429

key: train_recall
value: [0.96825397 0.96825397 0.96825397 0.96825397 0.95238095 0.93650794
 0.96825397 0.98412698 0.98412698 0.93650794]

mean value: 0.9634920634920635

key: test_roc_auc
value: [0.85714286 0.78571429 0.92857143 0.85714286 0.85714286 0.85714286
 0.71428571 0.92857143 0.92857143 0.78571429]

mean value: 0.8500000000000001

key: train_roc_auc
value: [0.96825397 0.97619048 0.97619048 0.97619048 0.97619048 0.96031746
 0.97619048 0.99206349 0.98412698 0.96031746]

mean value: 0.9746031746031747

key: test_jcc
value: [0.71428571 0.66666667 0.85714286 0.77777778 0.75       0.77777778
 0.6        0.85714286 0.85714286 0.57142857]

mean value: 0.7429365079365079

key: train_jcc
value: [0.93846154 0.953125   0.953125   0.953125   0.95238095 0.921875
 0.953125   0.98412698 0.96875    0.921875  ]

mean value: 0.9499969474969475

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01653957 0.01010847 0.00984597 0.00914907 0.0092411  0.009938
 0.01003265 0.00987983 0.00996757 0.00906849]

mean value: 0.010377073287963867

key: score_time
value: [0.00953102 0.00949621 0.00962806 0.00897574 0.00871825 0.00933528
 0.00961423 0.00881052 0.00943184 0.00919628]

mean value: 0.009273743629455567

key: test_mcc
value: [0.28867513 0.1490712  0.57735027 0.52223297 0.57735027 0.31622777
 0.17407766 0.71428571 0.4472136  0.1490712 ]

mean value: 0.39155557695993376

key: train_mcc
value: [0.51346746 0.55566238 0.54156609 0.50874391 0.50874391 0.52297636
 0.50874391 0.53724272 0.58399712 0.51890567]

mean value: 0.5300049520306789

key: test_accuracy
value: [0.64285714 0.57142857 0.78571429 0.71428571 0.78571429 0.64285714
 0.57142857 0.85714286 0.71428571 0.57142857]

mean value: 0.6857142857142857

key: train_accuracy
value: [0.74603175 0.76984127 0.76190476 0.74603175 0.74603175 0.75396825
 0.74603175 0.76190476 0.78571429 0.74603175]

mean value: 0.7563492063492063

key: test_fscore
value: [0.66666667 0.625      0.8        0.77777778 0.8        0.70588235
 0.66666667 0.85714286 0.75       0.625     ]

mean value: 0.7274136321195145

key: train_fscore
value: [0.77777778 0.79432624 0.78873239 0.77464789 0.77464789 0.78014184
 0.77464789 0.78571429 0.8057554  0.78082192]

mean value: 0.7837213518428147

key: test_precision
value: [0.625      0.55555556 0.75       0.63636364 0.75       0.6
 0.54545455 0.85714286 0.66666667 0.55555556]

mean value: 0.6541738816738817

key: train_precision
value: [0.69135802 0.71794872 0.70886076 0.69620253 0.69620253 0.70512821
 0.69620253 0.71428571 0.73684211 0.68674699]

mean value: 0.7049778109699341

key: test_recall
value: [0.71428571 0.71428571 0.85714286 1.         0.85714286 0.85714286
 0.85714286 0.85714286 0.85714286 0.71428571]

mean value: 0.8285714285714285

key: train_recall
value: [0.88888889 0.88888889 0.88888889 0.87301587 0.87301587 0.87301587
 0.87301587 0.87301587 0.88888889 0.9047619 ]

mean value: 0.8825396825396825

key: test_roc_auc
value: [0.64285714 0.57142857 0.78571429 0.71428571 0.78571429 0.64285714
 0.57142857 0.85714286 0.71428571 0.57142857]

mean value: 0.6857142857142857

key: train_roc_auc
value: [0.74603175 0.76984127 0.76190476 0.74603175 0.74603175 0.75396825
 0.74603175 0.76190476 0.78571429 0.74603175]

mean value: 0.7563492063492063

key: test_jcc
value: [0.5        0.45454545 0.66666667 0.63636364 0.66666667 0.54545455
 0.5        0.75       0.6        0.45454545]

mean value: 0.5774242424242424

key: train_jcc
value: [0.63636364 0.65882353 0.65116279 0.63218391 0.63218391 0.63953488
 0.63218391 0.64705882 0.6746988  0.64044944]

mean value: 0.6444643621244319

MCC on Blind test: 0.0

Accuracy on Blind test: 0.5

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.07304168 0.06317854 0.06117392 0.05121827 0.05126238 0.22504473
 0.04057932 0.04057169 0.0457325  0.04536366]

mean value: 0.06971666812896729

key: score_time
value: [0.01124525 0.01068401 0.01110744 0.01069069 0.01071715 0.01070666
 0.0110786  0.01121998 0.01117444 0.01086068]

mean value: 0.010948491096496583

key: test_mcc
value: [0.8660254  0.71428571 1.         0.8660254  1.         0.8660254
 0.8660254  0.71428571 0.8660254  1.        ]

mean value: 0.8758698447493622

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 0.92857143 0.85714286 0.92857143 1.        ]

mean value: 0.9357142857142857

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.85714286 1.         0.92307692 1.         0.93333333
 0.93333333 0.85714286 0.92307692 1.        ]

mean value: 0.9360439560439561

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.85714286 1.         1.         1.         0.875
 0.875      0.85714286 1.         1.        ]

mean value: 0.9339285714285714

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         0.85714286 0.85714286 1.        ]

mean value: 0.9428571428571428

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 0.92857143 0.85714286 0.92857143 1.        ]

mean value: 0.9357142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.75       1.         0.85714286 1.         0.875
 0.875      0.75       0.85714286 1.        ]

mean value: 0.8839285714285714

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.02651215 0.04846931 0.04752612 0.04747295 0.04773426 0.05400276
 0.04773259 0.04830027 0.04831672 0.04797649]

mean value: 0.046404361724853516

key: score_time
value: [0.02096605 0.02244663 0.02062106 0.02118969 0.02212191 0.02337909
 0.0236187  0.02378559 0.02339005 0.02385402]

mean value: 0.02253727912902832

key: test_mcc
value: [0.4472136  0.42857143 0.63245553 0.40824829 0.74535599 0.42857143
 0.8660254  0.57735027 0.74535599 0.63245553]

mean value: 0.5911603465147954

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71428571 0.71428571 0.78571429 0.64285714 0.85714286 0.71428571
 0.92857143 0.78571429 0.85714286 0.78571429]

mean value: 0.7785714285714286

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.71428571 0.82352941 0.73684211 0.875      0.71428571
 0.93333333 0.8        0.875      0.82352941]

mean value: 0.8045805690697332

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.71428571 0.7        0.58333333 0.77777778 0.71428571
 0.875      0.75       0.77777778 0.7       ]

mean value: 0.7259126984126985

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.85714286 0.71428571 1.         1.         1.         0.71428571
 1.         0.85714286 1.         1.        ]

mean value: 0.9142857142857143

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71428571 0.71428571 0.78571429 0.64285714 0.85714286 0.71428571
 0.92857143 0.78571429 0.85714286 0.78571429]

mean value: 0.7785714285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.55555556 0.7        0.58333333 0.77777778 0.55555556
 0.875      0.66666667 0.77777778 0.7       ]

mean value: 0.6791666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01848125 0.00909686 0.00875235 0.00871468 0.00875378 0.00881767
 0.00874782 0.00951767 0.00960565 0.00958204]

mean value: 0.010006976127624512

key: score_time
value: [0.00905585 0.00891948 0.00856829 0.0086031  0.00858045 0.00861192
 0.00858426 0.00927019 0.0089469  0.00931072]

mean value: 0.008845114707946777

key: test_mcc
value: [0.         0.         0.         0.31622777 0.28867513 0.71428571
 0.1490712  0.42857143 0.57735027 0.        ]

mean value: 0.24741815111584053

key: train_mcc
value: [0.39562828 0.40700206 0.43656413 0.42177569 0.37444189 0.33296358
 0.37444189 0.49838198 0.35954625 0.31803896]

mean value: 0.39187847170747214

key: test_accuracy
value: [0.5        0.5        0.5        0.64285714 0.64285714 0.85714286
 0.57142857 0.71428571 0.78571429 0.5       ]

mean value: 0.6214285714285714

key: train_accuracy
value: [0.69047619 0.6984127  0.71428571 0.70634921 0.68253968 0.65873016
 0.68253968 0.74603175 0.67460317 0.65079365]

mean value: 0.6904761904761905

key: test_fscore
value: [0.58823529 0.63157895 0.58823529 0.70588235 0.66666667 0.85714286
 0.625      0.71428571 0.8        0.53333333]

mean value: 0.6710360459973462

key: train_fscore
value: [0.72727273 0.72857143 0.73913043 0.73381295 0.71428571 0.70344828
 0.71428571 0.76470588 0.70921986 0.69863014]

mean value: 0.7233363122195821

key: test_precision
value: [0.5        0.5        0.5        0.6        0.625      0.85714286
 0.55555556 0.71428571 0.75       0.5       ]

mean value: 0.6101984126984127

key: train_precision
value: [0.65       0.66233766 0.68       0.67105263 0.64935065 0.62195122
 0.64935065 0.71232877 0.64102564 0.61445783]

mean value: 0.6551855051604334

key: test_recall
value: [0.71428571 0.85714286 0.71428571 0.85714286 0.71428571 0.85714286
 0.71428571 0.71428571 0.85714286 0.57142857]

mean value: 0.7571428571428571

key: train_recall
value: [0.82539683 0.80952381 0.80952381 0.80952381 0.79365079 0.80952381
 0.79365079 0.82539683 0.79365079 0.80952381]

mean value: 0.807936507936508

key: test_roc_auc
value: [0.5        0.5        0.5        0.64285714 0.64285714 0.85714286
 0.57142857 0.71428571 0.78571429 0.5       ]

mean value: 0.6214285714285714

key: train_roc_auc
value: [0.69047619 0.6984127  0.71428571 0.70634921 0.68253968 0.65873016
 0.68253968 0.74603175 0.67460317 0.65079365]

mean value: 0.6904761904761905

key: test_jcc
value: [0.41666667 0.46153846 0.41666667 0.54545455 0.5        0.75
 0.45454545 0.55555556 0.66666667 0.36363636]

mean value: 0.5130730380730381

key: train_jcc
value: [0.57142857 0.57303371 0.5862069  0.57954545 0.55555556 0.54255319
 0.55555556 0.61904762 0.54945055 0.53684211]

mean value: 0.5669219206752718

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01188612 0.01601815 0.01570535 0.01414609 0.01447415 0.0142417
 0.01480913 0.01494217 0.01382351 0.01519036]

mean value: 0.014523673057556152

key: score_time
value: [0.00955725 0.01170158 0.01173687 0.01169157 0.01166677 0.01166749
 0.0117631  0.01186204 0.01185417 0.01204991]

mean value: 0.011555075645446777

key: test_mcc
value: [0.57735027 0.28867513 0.74535599 0.52223297 0.71428571 0.8660254
 0.71428571 0.8660254  0.8660254  0.42857143]

mean value: 0.6588833432647635

key: train_mcc
value: [0.96825397 1.         0.96825397 0.75828754 0.95346259 0.92354815
 0.88014083 0.95250095 0.95250095 0.98425098]

mean value: 0.934119993839683

key: test_accuracy
value: [0.78571429 0.64285714 0.85714286 0.71428571 0.85714286 0.92857143
 0.85714286 0.92857143 0.92857143 0.71428571]

mean value: 0.8214285714285714

key: train_accuracy
value: [0.98412698 1.         0.98412698 0.86507937 0.97619048 0.96031746
 0.93650794 0.97619048 0.97619048 0.99206349]

mean value: 0.9650793650793651

key: test_fscore
value: [0.76923077 0.66666667 0.875      0.77777778 0.85714286 0.93333333
 0.85714286 0.92307692 0.92307692 0.71428571]

mean value: 0.8296733821733822

key: train_fscore
value: [0.98412698 1.         0.98412698 0.88111888 0.97560976 0.95867769
 0.93220339 0.97637795 0.976      0.99212598]

mean value: 0.9660367618259206

key: test_precision
value: [0.83333333 0.625      0.77777778 0.63636364 0.85714286 0.875
 0.85714286 1.         1.         0.71428571]

mean value: 0.8176046176046176

key: train_precision
value: [0.98412698 1.         0.98412698 0.7875     1.         1.
 1.         0.96875    0.98387097 0.984375  ]

mean value: 0.9692749935995904

key: test_recall
value: [0.71428571 0.71428571 1.         1.         0.85714286 1.
 0.85714286 0.85714286 0.85714286 0.71428571]

mean value: 0.8571428571428571

key: train_recall
value: [0.98412698 1.         0.98412698 1.         0.95238095 0.92063492
 0.87301587 0.98412698 0.96825397 1.        ]

mean value: 0.9666666666666667

key: test_roc_auc
value: [0.78571429 0.64285714 0.85714286 0.71428571 0.85714286 0.92857143
 0.85714286 0.92857143 0.92857143 0.71428571]

mean value: 0.8214285714285715

key: train_roc_auc
value: [0.98412698 1.         0.98412698 0.86507937 0.97619048 0.96031746
 0.93650794 0.97619048 0.97619048 0.99206349]

mean value: 0.9650793650793651

key: test_jcc
value: [0.625      0.5        0.77777778 0.63636364 0.75       0.875
 0.75       0.85714286 0.85714286 0.55555556]

mean value: 0.7183982683982684

key: train_jcc
value: [0.96875    1.         0.96875    0.7875     0.95238095 0.92063492
 0.87301587 0.95384615 0.953125   0.984375  ]

mean value: 0.93623778998779

MCC on Blind test: 0.25

Accuracy on Blind test: 0.6

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01435566 0.01289821 0.01348329 0.01348233 0.01295853 0.01291037
 0.01326108 0.01356745 0.01387191 0.01369071]

mean value: 0.013447952270507813

key: score_time
value: [0.00999284 0.01170158 0.01200676 0.011693   0.01173377 0.01174426
 0.01169348 0.0117228  0.01174641 0.0116868 ]

mean value: 0.011572170257568359

key: test_mcc
value: [0.52223297 0.42857143 0.74535599 0.74535599 0.52223297 0.63245553
 0.52223297 0.63245553 0.8660254  0.57735027]

mean value: 0.6194269054213986

key: train_mcc
value: [0.81110711 0.88014083 0.96874225 0.92354815 0.64477154 0.70710678
 0.63245553 0.85207241 0.96825397 0.96825397]

mean value: 0.8356452531817429

key: test_accuracy
value: [0.71428571 0.71428571 0.85714286 0.85714286 0.71428571 0.78571429
 0.71428571 0.78571429 0.92857143 0.78571429]

mean value: 0.7857142857142857

key: train_accuracy
value: [0.8968254  0.93650794 0.98412698 0.96031746 0.79365079 0.83333333
 0.78571429 0.92063492 0.98412698 0.98412698]

mean value: 0.9079365079365079

key: test_fscore
value: [0.77777778 0.71428571 0.875      0.875      0.6        0.82352941
 0.77777778 0.72727273 0.92307692 0.76923077]

mean value: 0.7862951101186395

key: train_fscore
value: [0.90647482 0.93220339 0.984375   0.96183206 0.74       0.85714286
 0.82352941 0.9137931  0.98412698 0.98412698]

mean value: 0.9087604611652902

key: test_precision
value: [0.63636364 0.71428571 0.77777778 0.77777778 1.         0.7
 0.63636364 1.         1.         0.83333333]

mean value: 0.8075901875901876

key: train_precision
value: [0.82894737 1.         0.96923077 0.92647059 1.         0.75
 0.7        1.         0.98412698 0.98412698]

mean value: 0.9142902694141084

key: test_recall
value: [1.         0.71428571 1.         1.         0.42857143 1.
 1.         0.57142857 0.85714286 0.71428571]

mean value: 0.8285714285714285

key: train_recall
value: [1.         0.87301587 1.         1.         0.58730159 1.
 1.         0.84126984 0.98412698 0.98412698]

mean value: 0.926984126984127

key: test_roc_auc
value: [0.71428571 0.71428571 0.85714286 0.85714286 0.71428571 0.78571429
 0.71428571 0.78571429 0.92857143 0.78571429]

mean value: 0.7857142857142857

key: train_roc_auc
value: [0.8968254  0.93650794 0.98412698 0.96031746 0.79365079 0.83333333
 0.78571429 0.92063492 0.98412698 0.98412698]

mean value: 0.9079365079365079

key: test_jcc
value: [0.63636364 0.55555556 0.77777778 0.77777778 0.42857143 0.7
 0.63636364 0.57142857 0.85714286 0.625     ]

mean value: 0.6565981240981241

key: train_jcc
value: [0.82894737 0.87301587 0.96923077 0.92647059 0.58730159 0.75
 0.7        0.84126984 0.96875    0.96875   ]

mean value: 0.8413736027474418

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.11607385 0.1076138  0.10608673 0.10151887 0.10518885 0.10832644
 0.10317087 0.10290027 0.10078859 0.10589075]

mean value: 0.10575590133666993

key: score_time
value: [0.01610017 0.01603413 0.01495147 0.01501298 0.01622701 0.0165329
 0.01495528 0.01502419 0.01493001 0.0160799 ]

mean value: 0.015584802627563477

key: test_mcc
value: [0.8660254  0.4472136  1.         0.71428571 1.         0.57735027
 0.8660254  0.71428571 0.8660254  1.        ]

mean value: 0.8051211504614328

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 0.71428571 1.         0.85714286 1.         0.78571429
 0.92857143 0.85714286 0.92857143 1.        ]

mean value: 0.9

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.75       1.         0.85714286 1.         0.8
 0.93333333 0.85714286 0.92307692 1.        ]

mean value: 0.9054029304029304

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.66666667 1.         0.85714286 1.         0.75
 0.875      0.85714286 1.         1.        ]

mean value: 0.888095238095238

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 1.         0.85714286 0.85714286 1.        ]

mean value: 0.9285714285714286

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.71428571 1.         0.85714286 1.         0.78571429
 0.92857143 0.85714286 0.92857143 1.        ]

mean value: 0.9

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.6        1.         0.75       1.         0.66666667
 0.875      0.75       0.85714286 1.        ]

mean value: 0.8373809523809523

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03131676 0.04441977 0.04395151 0.03373003 0.03642392 0.04966235
 0.04267502 0.03279471 0.038481   0.04732513]

mean value: 0.040078020095825194

key: score_time
value: [0.01685858 0.02886391 0.02688169 0.03952265 0.03935456 0.02259636
 0.0331192  0.0186348  0.02615976 0.02439404]

mean value: 0.027638554573059082

key: test_mcc
value: [0.8660254  0.71428571 1.         0.71428571 1.         0.8660254
 1.         0.71428571 0.74535599 1.        ]

mean value: 0.862026394292595

key: train_mcc
value: [0.98425098 0.98425098 0.98425098 1.         0.98425098 1.
 1.         1.         0.96874225 1.        ]

mean value: 0.9905746182632438

key: test_accuracy
value: [0.92857143 0.85714286 1.         0.85714286 1.         0.92857143
 1.         0.85714286 0.85714286 1.        ]

mean value: 0.9285714285714286

key: train_accuracy
value: [0.99206349 0.99206349 0.99206349 1.         0.99206349 1.
 1.         1.         0.98412698 1.        ]

mean value: 0.9952380952380953

key: test_fscore
value: [0.93333333 0.85714286 1.         0.85714286 1.         0.93333333
 1.         0.85714286 0.83333333 1.        ]

mean value: 0.9271428571428572

key: train_fscore
value: [0.99212598 0.99212598 0.992      1.         0.99212598 1.
 1.         1.         0.98387097 1.        ]

mean value: 0.995224892049784

key: test_precision
value: [0.875      0.85714286 1.         0.85714286 1.         0.875
 1.         0.85714286 1.         1.        ]

mean value: 0.9321428571428572

key: train_precision
value: [0.984375 0.984375 1.       1.       0.984375 1.       1.       1.
 1.       1.      ]

mean value: 0.9953125

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         0.85714286 0.71428571 1.        ]

mean value: 0.9285714285714286

key: train_recall
value: [1.         1.         0.98412698 1.         1.         1.
 1.         1.         0.96825397 1.        ]

mean value: 0.9952380952380953

key: test_roc_auc
value: [0.92857143 0.85714286 1.         0.85714286 1.         0.92857143
 1.         0.85714286 0.85714286 1.        ]

mean value: 0.9285714285714286

key: train_roc_auc
value: [0.99206349 0.99206349 0.99206349 1.         0.99206349 1.
 1.         1.         0.98412698 1.        ]

mean value: 0.9952380952380953

key: test_jcc
value: [0.875      0.75       1.         0.75       1.         0.875
 1.         0.75       0.71428571 1.        ]

mean value: 0.8714285714285714

key: train_jcc
value: [0.984375   0.984375   0.98412698 1.         0.984375   1.
 1.         1.         0.96825397 1.        ]

mean value: 0.9905505952380952

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.04321051 0.04301333 0.04298568 0.04127693 0.04286027 0.03904319
 0.0431571  0.04193687 0.04288578 0.04299068]

mean value: 0.042336034774780276

key: score_time
value: [0.01759887 0.02214241 0.02269411 0.02224588 0.02418971 0.02241874
 0.02038097 0.02414989 0.01431775 0.02205825]

mean value: 0.021219658851623534

key: test_mcc
value: [0.1490712  0.         0.74535599 0.31622777 0.57735027 0.4472136
 0.40824829 0.74535599 0.8660254  0.1490712 ]

mean value: 0.4403919706954555

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.57142857 0.5        0.85714286 0.64285714 0.78571429 0.71428571
 0.64285714 0.85714286 0.92857143 0.57142857]

mean value: 0.7071428571428572

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.625      0.63157895 0.875      0.70588235 0.8        0.75
 0.73684211 0.83333333 0.93333333 0.625     ]

mean value: 0.7515970072239422

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.55555556 0.5        0.77777778 0.6        0.75       0.66666667
 0.58333333 1.         0.875      0.55555556]

mean value: 0.6863888888888889

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 0.85714286 0.85714286
 1.         0.71428571 1.         0.71428571]

mean value: 0.8571428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.57142857 0.5        0.85714286 0.64285714 0.78571429 0.71428571
 0.64285714 0.85714286 0.92857143 0.57142857]

mean value: 0.7071428571428572

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.45454545 0.46153846 0.77777778 0.54545455 0.66666667 0.6
 0.58333333 0.71428571 0.875      0.45454545]

mean value: 0.6133147408147408

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.25

Accuracy on Blind test: 0.6

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.25483227 0.27963281 0.24360347 0.20475602 0.24511337 0.2791121
 0.24376607 0.28063536 0.28055573 0.24622631]

mean value: 0.25582334995269773

key: score_time
value: [0.00938106 0.00896335 0.00910473 0.00897121 0.00927019 0.0091157
 0.00909758 0.00937986 0.00938296 0.0091424 ]

mean value: 0.009180903434753418

key: test_mcc
value: [1.         0.71428571 1.         0.71428571 1.         0.71428571
 1.         0.71428571 0.8660254  1.        ]

mean value: 0.8723168260927296

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 1.         0.85714286 0.92857143 1.        ]

mean value: 0.9357142857142857

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 1.         0.85714286 0.92307692 1.        ]

mean value: 0.9351648351648352

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 1.         0.85714286 1.         1.        ]

mean value: 0.9428571428571428

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 1.         0.85714286 0.85714286 1.        ]

mean value: 0.9285714285714286

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 1.         0.85714286 0.92857143 1.        ]

mean value: 0.9357142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.75       1.         0.75       1.         0.75
 1.         0.75       0.85714286 1.        ]

mean value: 0.8857142857142857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01523948 0.01647639 0.0168469  0.01656938 0.01645827 0.01667857
 0.01640034 0.01653886 0.01657081 0.01692367]

mean value: 0.01647026538848877

key: score_time
value: [0.0127914  0.01215816 0.01195741 0.01439118 0.01406884 0.01425385
 0.01452756 0.01448226 0.0144906  0.01466036]

mean value: 0.013778162002563477

key: test_mcc
value: [0.40824829 0.63245553 0.63245553 0.52223297 0.31622777 0.31622777
 0.17407766 0.40824829 0.40824829 0.28867513]

mean value: 0.41070972259102206

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.64285714 0.78571429 0.78571429 0.71428571 0.64285714 0.64285714
 0.57142857 0.64285714 0.64285714 0.64285714]

mean value: 0.6714285714285715

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.73684211 0.72727273 0.82352941 0.77777778 0.54545455 0.70588235
 0.66666667 0.73684211 0.73684211 0.66666667]

mean value: 0.712377646433374

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.58333333 1.         0.7        0.63636364 0.75       0.6
 0.54545455 0.58333333 0.58333333 0.625     ]

mean value: 0.6606818181818181

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.57142857 1.         1.         0.42857143 0.85714286
 0.85714286 1.         1.         0.71428571]

mean value: 0.8428571428571429

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.64285714 0.78571429 0.78571429 0.71428571 0.64285714 0.64285714
 0.57142857 0.64285714 0.64285714 0.64285714]

mean value: 0.6714285714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.58333333 0.57142857 0.7        0.63636364 0.375      0.54545455
 0.5        0.58333333 0.58333333 0.5       ]

mean value: 0.5578246753246753

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.6

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02925706 0.03722405 0.02771807 0.0325644  0.03255486 0.03256893
 0.02743649 0.03248382 0.03251648 0.03256011]

mean value: 0.03168842792510986

key: score_time
value: [0.02246881 0.02184463 0.02256989 0.02331829 0.01608825 0.02025676
 0.02089095 0.02181077 0.0225234  0.02335548]

mean value: 0.021512722969055174

key: test_mcc
value: [0.71428571 0.4472136  0.8660254  0.74535599 0.8660254  0.63245553
 0.8660254  0.71428571 0.8660254  0.71428571]

mean value: 0.7431983878028462

key: train_mcc
value: [0.96825397 1.         0.96825397 0.96825397 0.96825397 0.96825397
 0.98425098 0.95250095 0.98425098 0.96825397]

mean value: 0.9730526730528191

key: test_accuracy
value: [0.85714286 0.71428571 0.92857143 0.85714286 0.92857143 0.78571429
 0.92857143 0.85714286 0.92857143 0.85714286]

mean value: 0.8642857142857143

key: train_accuracy
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.99206349 0.97619048 0.99206349 0.98412698]

mean value: 0.9865079365079364

key: test_fscore
value: [0.85714286 0.75       0.93333333 0.875      0.93333333 0.82352941
 0.93333333 0.85714286 0.92307692 0.85714286]

mean value: 0.8743034906270201

key: train_fscore
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.992      0.97637795 0.992      0.98412698]

mean value: 0.986513985751781

key: test_precision
value: [0.85714286 0.66666667 0.875      0.77777778 0.875      0.7
 0.875      0.85714286 1.         0.85714286]

mean value: 0.8340873015873016

key: train_precision
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 1.         0.96875    1.         0.98412698]

mean value: 0.9873511904761905

key: test_recall
value: [0.85714286 0.85714286 1.         1.         1.         1.
 1.         0.85714286 0.85714286 0.85714286]

mean value: 0.9285714285714286

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:128: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:131: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9857142857142857

key: test_roc_auc
value: [0.85714286 0.71428571 0.92857143 0.85714286 0.92857143 0.78571429
 0.92857143 0.85714286 0.92857143 0.85714286]

mean value: 0.8642857142857143

key: train_roc_auc
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.99206349 0.97619048 0.99206349 0.98412698]

mean value: 0.9865079365079366

key: test_jcc
value: [0.75       0.6        0.875      0.77777778 0.875      0.7
 0.875      0.75       0.85714286 0.75      ]

mean value: 0.7809920634920635

key: train_jcc
value: [0.96875    1.         0.96875    0.96875    0.96875    0.96875
 0.98412698 0.95384615 0.98412698 0.96875   ]

mean value: 0.9734600122100122

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.26872206 0.24485159 0.19331622 0.19329    0.1993072  0.18969822
 0.27916956 0.27131557 0.28865838 0.3392396 ]

mean value: 0.24675683975219725

key: score_time
value: [0.02235699 0.02243185 0.02210402 0.02205634 0.02375603 0.02349758
 0.02142692 0.02302432 0.02361107 0.0212996 ]

mean value: 0.022556471824645995

key: test_mcc
value: [0.71428571 0.4472136  0.8660254  0.74535599 0.8660254  0.63245553
 0.8660254  0.71428571 0.8660254  0.71428571]

mean value: 0.7431983878028462

key: train_mcc
value: [0.96825397 1.         0.96825397 0.96825397 0.96825397 0.96825397
 0.98425098 0.95250095 0.98425098 0.96825397]

mean value: 0.9730526730528191

key: test_accuracy
value: [0.85714286 0.71428571 0.92857143 0.85714286 0.92857143 0.78571429
 0.92857143 0.85714286 0.92857143 0.85714286]

mean value: 0.8642857142857143

key: train_accuracy
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.99206349 0.97619048 0.99206349 0.98412698]

mean value: 0.9865079365079364

key: test_fscore
value: [0.85714286 0.75       0.93333333 0.875      0.93333333 0.82352941
 0.93333333 0.85714286 0.92307692 0.85714286]

mean value: 0.8743034906270201

key: train_fscore
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.992      0.97637795 0.992      0.98412698]

mean value: 0.986513985751781

key: test_precision
value: [0.85714286 0.66666667 0.875      0.77777778 0.875      0.7
 0.875      0.85714286 1.         0.85714286]

mean value: 0.8340873015873016

key: train_precision
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 1.         0.96875    1.         0.98412698]

mean value: 0.9873511904761905

key: test_recall
value: [0.85714286 0.85714286 1.         1.         1.         1.
 1.         0.85714286 0.85714286 0.85714286]

mean value: 0.9285714285714286

key: train_recall
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9857142857142857

key: test_roc_auc
value: [0.85714286 0.71428571 0.92857143 0.85714286 0.92857143 0.78571429
 0.92857143 0.85714286 0.92857143 0.85714286]

mean value: 0.8642857142857143

key: train_roc_auc
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.99206349 0.97619048 0.99206349 0.98412698]

mean value: 0.9865079365079366

key: test_jcc
value: [0.75       0.6        0.875      0.77777778 0.875      0.7
 0.875      0.75       0.85714286 0.75      ]

mean value: 0.7809920634920635

key: train_jcc
value: [0.96875    1.         0.96875    0.96875    0.96875    0.96875
 0.98412698 0.95384615 0.98412698 0.96875   ]

mean value: 0.9734600122100122

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02622414 0.0280211  0.02811193 0.0279851  0.02787256 0.0258975
 0.02756739 0.03350663 0.02549291 0.02850413]

mean value: 0.027918338775634766

key: score_time
value: [0.01168752 0.01170516 0.01172113 0.01167536 0.01167822 0.01169229
 0.01169801 0.01164389 0.01171398 0.01167274]

mean value: 0.011688828468322754

key: test_mcc
value: [0.74535599 0.42857143 0.52223297 0.57735027 0.57735027 0.8660254
 0.63245553 0.4472136  0.57735027 0.8660254 ]

mean value: 0.623993113160984

key: train_mcc
value: [0.88900089 0.9369802  0.87345612 0.9047619  0.93650794 0.90659109
 0.95250095 0.9047619  0.88900089 0.9047619 ]

mean value: 0.9098323803395101

key: test_accuracy
value: [0.85714286 0.71428571 0.71428571 0.78571429 0.78571429 0.92857143
 0.78571429 0.71428571 0.78571429 0.92857143]

mean value: 0.8

key: train_accuracy
value: [0.94444444 0.96825397 0.93650794 0.95238095 0.96825397 0.95238095
 0.97619048 0.95238095 0.94444444 0.95238095]

mean value: 0.9547619047619047

key: test_fscore
value: [0.83333333 0.71428571 0.77777778 0.8        0.76923077 0.93333333
 0.82352941 0.66666667 0.8        0.93333333]

mean value: 0.8051490339725633

key: train_fscore
value: [0.94488189 0.96875    0.9375     0.95238095 0.96825397 0.95384615
 0.97637795 0.95238095 0.94488189 0.95238095]

mean value: 0.9551634711526443

key: test_precision
value: [1.         0.71428571 0.63636364 0.75       0.83333333 0.875
 0.7        0.8        0.75       0.875     ]

mean value: 0.7933982683982684

key: train_precision
value: [0.9375     0.95384615 0.92307692 0.95238095 0.96825397 0.92537313
 0.96875    0.95238095 0.9375     0.95238095]

mean value: 0.947144303664826

key: test_recall
value: [0.71428571 0.71428571 1.         0.85714286 0.71428571 1.
 1.         0.57142857 0.85714286 1.        ]

mean value: 0.8428571428571429

key: train_recall
value: [0.95238095 0.98412698 0.95238095 0.95238095 0.96825397 0.98412698
 0.98412698 0.95238095 0.95238095 0.95238095]

mean value: 0.9634920634920634

key: test_roc_auc
value: [0.85714286 0.71428571 0.71428571 0.78571429 0.78571429 0.92857143
 0.78571429 0.71428571 0.78571429 0.92857143]

mean value: 0.8

key: train_roc_auc
value: [0.94444444 0.96825397 0.93650794 0.95238095 0.96825397 0.95238095
 0.97619048 0.95238095 0.94444444 0.95238095]

mean value: 0.9547619047619047

key: test_jcc
value: [0.71428571 0.55555556 0.63636364 0.66666667 0.625      0.875
 0.7        0.5        0.66666667 0.875     ]

mean value: 0.6814538239538239

key: train_jcc
value: [0.89552239 0.93939394 0.88235294 0.90909091 0.93846154 0.91176471
 0.95384615 0.90909091 0.89552239 0.90909091]

mean value: 0.9144136782152585

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.68336606 0.79824805 0.65278387 0.66261816 0.76574349 0.71132708
 0.62404537 0.68007779 0.76493001 0.67316413]

mean value: 0.7016304016113282

key: score_time
value: [0.01466322 0.01438069 0.01444507 0.01473403 0.01458716 0.01554775
 0.01198745 0.01623559 0.01437926 0.01435518]

mean value: 0.014531540870666503

key: test_mcc
value: [0.57735027 0.4472136  0.8660254  0.4472136  0.8660254  0.74535599
 0.63245553 0.74535599 0.8660254  0.71428571]

mean value: 0.6907306902862108

key: train_mcc
value: [1.         1.         0.98425098 1.         1.         1.
 1.         0.98425098 0.98425098 0.98425098]

mean value: 0.9937003937005905

key: test_accuracy
value: [0.78571429 0.71428571 0.92857143 0.71428571 0.92857143 0.85714286
 0.78571429 0.85714286 0.92857143 0.85714286]

mean value: 0.8357142857142857

key: train_accuracy
value: [1.         1.         0.99206349 1.         1.         1.
 1.         0.99206349 0.99206349 0.99206349]

mean value: 0.9968253968253968

key: test_fscore
value: [0.76923077 0.75       0.93333333 0.75       0.93333333 0.875
 0.82352941 0.875      0.93333333 0.85714286]

mean value: 0.8499903038138332

key: train_fscore
value: [1.         1.         0.99212598 1.         1.         1.
 1.         0.99212598 0.99212598 0.99212598]

mean value: 0.9968503937007874

key: test_precision
value: [0.83333333 0.66666667 0.875      0.66666667 0.875      0.77777778
 0.7        0.77777778 0.875      0.85714286]

mean value: 0.7904365079365079

key: train_precision
value: [1.       1.       0.984375 1.       1.       1.       1.       0.984375
 0.984375 0.984375]

mean value: 0.99375

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         0.85714286]

mean value: 0.9285714285714286

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78571429 0.71428571 0.92857143 0.71428571 0.92857143 0.85714286
 0.78571429 0.85714286 0.92857143 0.85714286]

mean value: 0.8357142857142857

key: train_roc_auc
value: [1.         1.         0.99206349 1.         1.         1.
 1.         0.99206349 0.99206349 0.99206349]

mean value: 0.9968253968253968

key: test_jcc
value: [0.625      0.6        0.875      0.6        0.875      0.77777778
 0.7        0.77777778 0.875      0.75      ]

mean value: 0.7455555555555555

key: train_jcc
value: [1.       1.       0.984375 1.       1.       1.       1.       0.984375
 0.984375 0.984375]

mean value: 0.99375

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.0120337  0.00913382 0.00880861 0.00886416 0.00864506 0.00857353
 0.00869036 0.00872898 0.00838923 0.00855565]

mean value: 0.009042310714721679

key: score_time
value: [0.01197982 0.00897551 0.00884962 0.00944877 0.00865412 0.00858283
 0.00853062 0.00851512 0.00845766 0.00850487]

mean value: 0.00904989242553711

key: test_mcc
value: [ 0.         -0.17407766  0.          0.          0.4472136   0.17407766
  0.31622777  0.          0.14285714  0.17407766]

mean value: 0.10803761603296366

key: train_mcc
value: [0.37188796 0.43759497 0.40192095 0.52407367 0.42943789 0.51910855
 0.41027734 0.59825454 0.52512211 0.40422604]

mean value: 0.46219040247551463

key: test_accuracy
value: [0.5        0.42857143 0.5        0.5        0.71428571 0.57142857
 0.64285714 0.5        0.57142857 0.57142857]

mean value: 0.55

key: train_accuracy
value: [0.68253968 0.69047619 0.6984127  0.76190476 0.71428571 0.75396825
 0.6984127  0.79365079 0.74603175 0.6984127 ]

mean value: 0.7238095238095238

key: test_fscore
value: [0.58823529 0.55555556 0.58823529 0.53333333 0.75       0.66666667
 0.70588235 0.         0.57142857 0.66666667]

mean value: 0.5626003734827264

key: train_fscore
value: [0.71014493 0.75159236 0.72058824 0.75806452 0.72307692 0.77697842
 0.73239437 0.77192982 0.78378378 0.72463768]

mean value: 0.7453191031692181

key: test_precision
value: [0.5        0.45454545 0.5        0.5        0.66666667 0.54545455
 0.6        0.         0.57142857 0.54545455]

mean value: 0.4883549783549783

key: train_precision
value: [0.65333333 0.62765957 0.67123288 0.7704918  0.70149254 0.71052632
 0.65822785 0.8627451  0.68235294 0.66666667]

mean value: 0.7004728994878962

key: test_recall
value: [0.71428571 0.71428571 0.71428571 0.57142857 0.85714286 0.85714286
 0.85714286 0.         0.57142857 0.85714286]

mean value: 0.6714285714285714

key: train_recall
value: [0.77777778 0.93650794 0.77777778 0.74603175 0.74603175 0.85714286
 0.82539683 0.6984127  0.92063492 0.79365079]

mean value: 0.807936507936508

key: test_roc_auc
value: [0.5        0.42857143 0.5        0.5        0.71428571 0.57142857
 0.64285714 0.5        0.57142857 0.57142857]

mean value: 0.55

key: train_roc_auc
value: [0.68253968 0.69047619 0.6984127  0.76190476 0.71428571 0.75396825
 0.6984127  0.79365079 0.74603175 0.6984127 ]

mean value: 0.7238095238095238

key: test_jcc
value: [0.41666667 0.38461538 0.41666667 0.36363636 0.6        0.5
 0.54545455 0.         0.4        0.5       ]

mean value: 0.4127039627039627

key: train_jcc
value: [0.5505618  0.60204082 0.56321839 0.61038961 0.56626506 0.63529412
 0.57777778 0.62857143 0.64444444 0.56818182]

mean value: 0.594674526213704

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0091207  0.00891185 0.00888276 0.00885987 0.00969458 0.00874472
 0.00878119 0.0088098  0.00976038 0.00879478]

mean value: 0.009036064147949219

key: score_time
value: [0.00873804 0.00874662 0.00861692 0.00858283 0.00940084 0.00852609
 0.0085516  0.00952411 0.00927234 0.00853014]

mean value: 0.008848953247070312

key: test_mcc
value: [ 0.28867513 -0.1490712   0.4472136   0.31622777  0.4472136   0.8660254
  0.4472136   0.14285714  0.57735027  0.57735027]

mean value: 0.39610555736323716

key: train_mcc
value: [0.70276422 0.68811011 0.65610499 0.72345771 0.65610499 0.72011523
 0.67357531 0.76200076 0.70062273 0.71428571]

mean value: 0.6997141756428337

key: test_accuracy
value: [0.64285714 0.42857143 0.71428571 0.64285714 0.71428571 0.92857143
 0.71428571 0.57142857 0.78571429 0.78571429]

mean value: 0.6928571428571428

key: train_accuracy
value: [0.84920635 0.84126984 0.82539683 0.85714286 0.82539683 0.85714286
 0.83333333 0.88095238 0.84920635 0.85714286]

mean value: 0.8476190476190476

key: test_fscore
value: [0.61538462 0.33333333 0.75       0.70588235 0.66666667 0.93333333
 0.75       0.57142857 0.8        0.8       ]

mean value: 0.6926028873087696

key: train_fscore
value: [0.85714286 0.85074627 0.81355932 0.86764706 0.8358209  0.86567164
 0.84444444 0.88188976 0.85496183 0.85714286]

mean value: 0.8529026941398331

key: test_precision
value: [0.66666667 0.4        0.66666667 0.6        0.8        0.875
 0.66666667 0.57142857 0.75       0.75      ]

mean value: 0.6746428571428571

key: train_precision
value: [0.81428571 0.8028169  0.87272727 0.80821918 0.78873239 0.81690141
 0.79166667 0.875      0.82352941 0.85714286]

mean value: 0.825102180489476

key: test_recall
value: [0.57142857 0.28571429 0.85714286 0.85714286 0.57142857 1.
 0.85714286 0.57142857 0.85714286 0.85714286]

mean value: 0.7285714285714285

key: train_recall
value: [0.9047619  0.9047619  0.76190476 0.93650794 0.88888889 0.92063492
 0.9047619  0.88888889 0.88888889 0.85714286]

mean value: 0.8857142857142857

key: test_roc_auc
value: [0.64285714 0.42857143 0.71428571 0.64285714 0.71428571 0.92857143
 0.71428571 0.57142857 0.78571429 0.78571429]

mean value: 0.6928571428571428

key: train_roc_auc
value: [0.84920635 0.84126984 0.82539683 0.85714286 0.82539683 0.85714286
 0.83333333 0.88095238 0.84920635 0.85714286]

mean value: 0.8476190476190476

key: test_jcc
value: [0.44444444 0.2        0.6        0.54545455 0.5        0.875
 0.6        0.4        0.66666667 0.66666667]

mean value: 0.5498232323232323

key: train_jcc
value: [0.75       0.74025974 0.68571429 0.76623377 0.71794872 0.76315789
 0.73076923 0.78873239 0.74666667 0.75      ]

mean value: 0.7439482696695447

MCC on Blind test: 0.09

Accuracy on Blind test: 0.5

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00861692 0.00846934 0.0086894  0.00878334 0.00939131 0.00949144
 0.00942993 0.00903368 0.00895691 0.00950742]

mean value: 0.009036970138549805

key: score_time
value: [0.0095551  0.00964808 0.0103519  0.009969   0.01026988 0.01024032
 0.01040959 0.00944185 0.01325274 0.01023769]

mean value: 0.010337615013122558

key: test_mcc
value: [ 0.4472136   0.14285714  0.71428571 -0.28867513  0.42857143  0.57735027
  0.1490712   0.28867513  0.          0.4472136 ]

mean value: 0.2906562944403813

key: train_mcc
value: [0.6415003  0.68132997 0.60942528 0.65376533 0.54304508 0.63887656
 0.6032506  0.63692976 0.62029917 0.64482588]

mean value: 0.6273247932485556

key: test_accuracy
value: [0.71428571 0.57142857 0.85714286 0.35714286 0.71428571 0.78571429
 0.57142857 0.64285714 0.5        0.71428571]

mean value: 0.6428571428571429

key: train_accuracy
value: [0.81746032 0.83333333 0.8015873  0.82539683 0.76984127 0.81746032
 0.8015873  0.81746032 0.80952381 0.81746032]

mean value: 0.8111111111111111

key: test_fscore
value: [0.75       0.57142857 0.85714286 0.4        0.71428571 0.8
 0.625      0.61538462 0.46153846 0.66666667]

mean value: 0.6461446886446887

key: train_fscore
value: [0.82962963 0.84892086 0.81481481 0.81666667 0.78195489 0.82706767
 0.80314961 0.82442748 0.81538462 0.83211679]

mean value: 0.8194133021732467

key: test_precision
value: [0.66666667 0.57142857 0.85714286 0.375      0.71428571 0.75
 0.55555556 0.66666667 0.5        0.8       ]

mean value: 0.6456746031746031

key: train_precision
value: [0.77777778 0.77631579 0.76388889 0.85964912 0.74285714 0.78571429
 0.796875   0.79411765 0.79104478 0.77027027]

mean value: 0.7858510700967294

key: test_recall
value: [0.85714286 0.57142857 0.85714286 0.42857143 0.71428571 0.85714286
 0.71428571 0.57142857 0.42857143 0.57142857]

mean value: 0.6571428571428571

key: train_recall
value: [0.88888889 0.93650794 0.87301587 0.77777778 0.82539683 0.87301587
 0.80952381 0.85714286 0.84126984 0.9047619 ]

mean value: 0.8587301587301587

key: test_roc_auc
value: [0.71428571 0.57142857 0.85714286 0.35714286 0.71428571 0.78571429
 0.57142857 0.64285714 0.5        0.71428571]

mean value: 0.6428571428571429

key: train_roc_auc
value: [0.81746032 0.83333333 0.8015873  0.82539683 0.76984127 0.81746032
 0.8015873  0.81746032 0.80952381 0.81746032]

mean value: 0.8111111111111111

key: test_jcc
value: [0.6        0.4        0.75       0.25       0.55555556 0.66666667
 0.45454545 0.44444444 0.3        0.5       ]

mean value: 0.4921212121212121

key: train_jcc
value: [0.70886076 0.7375     0.6875     0.69014085 0.64197531 0.70512821
 0.67105263 0.7012987  0.68831169 0.7125    ]

mean value: 0.694426813952361

MCC on Blind test: -0.17

Accuracy on Blind test: 0.4

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0100832  0.00976014 0.00987196 0.01112866 0.0098362  0.00980902
 0.00974154 0.00981092 0.00974679 0.00981164]

mean value: 0.009960007667541505

key: score_time
value: [0.00885224 0.00879025 0.00884819 0.00890374 0.00895476 0.00908732
 0.00925803 0.00901294 0.0088799  0.00916076]

mean value: 0.008974814414978027

key: test_mcc
value: [0.4472136  0.28867513 0.52223297 0.57735027 0.74535599 0.8660254
 0.63245553 0.4472136  0.4472136  0.63245553]

mean value: 0.5606191618503126

key: train_mcc
value: [0.81116045 0.89170166 0.85811633 0.84169408 0.81116045 0.84169408
 0.84511128 0.85985517 0.84169408 0.81322028]

mean value: 0.8415407870156413

key: test_accuracy
value: [0.71428571 0.64285714 0.71428571 0.78571429 0.85714286 0.92857143
 0.78571429 0.71428571 0.71428571 0.78571429]

mean value: 0.7642857142857142

key: train_accuracy
value: [0.9047619  0.94444444 0.92857143 0.92063492 0.9047619  0.92063492
 0.92063492 0.92857143 0.92063492 0.9047619 ]

mean value: 0.9198412698412698

key: test_fscore
value: [0.66666667 0.66666667 0.77777778 0.8        0.83333333 0.93333333
 0.82352941 0.66666667 0.75       0.82352941]

mean value: 0.7741503267973856

key: train_fscore
value: [0.90769231 0.94656489 0.93023256 0.921875   0.90769231 0.921875
 0.92424242 0.93129771 0.921875   0.90909091]

mean value: 0.922243810227733

key: test_precision
value: [0.8        0.625      0.63636364 0.75       1.         0.875
 0.7        0.8        0.66666667 0.7       ]

mean value: 0.7553030303030303

key: train_precision
value: [0.88059701 0.91176471 0.90909091 0.90769231 0.88059701 0.90769231
 0.88405797 0.89705882 0.90769231 0.86956522]

mean value: 0.895580857983614

key: test_recall
value: [0.57142857 0.71428571 1.         0.85714286 0.71428571 1.
 1.         0.57142857 0.85714286 1.        ]

mean value: 0.8285714285714285

key: train_recall
value: [0.93650794 0.98412698 0.95238095 0.93650794 0.93650794 0.93650794
 0.96825397 0.96825397 0.93650794 0.95238095]

mean value: 0.9507936507936507

key: test_roc_auc
value: [0.71428571 0.64285714 0.71428571 0.78571429 0.85714286 0.92857143
 0.78571429 0.71428571 0.71428571 0.78571429]

mean value: 0.7642857142857143

key: train_roc_auc
value: [0.9047619  0.94444444 0.92857143 0.92063492 0.9047619  0.92063492
 0.92063492 0.92857143 0.92063492 0.9047619 ]

mean value: 0.9198412698412699

key: test_jcc
value: [0.5        0.5        0.63636364 0.66666667 0.71428571 0.875
 0.7        0.5        0.6        0.7       ]

mean value: 0.6392316017316018

key: train_jcc
value: [0.83098592 0.89855072 0.86956522 0.85507246 0.83098592 0.85507246
 0.85915493 0.87142857 0.85507246 0.83333333]

mean value: 0.8559221998658618

MCC on Blind test: 0.25

Accuracy on Blind test: 0.6

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.51826239 0.60860467 0.53846049 0.5424521  0.57185245 0.70954967
 0.54308391 0.59095526 0.54834795 0.65395856]

mean value: 0.5825527429580688

key: score_time
value: [0.01227689 0.01239133 0.01249623 0.01224232 0.01252151 0.01222205
 0.02427602 0.01235914 0.01278067 0.01255131]

mean value: 0.013611745834350587

key: test_mcc
value: [0.57735027 0.42857143 0.8660254  0.57735027 0.8660254  0.74535599
 0.31622777 0.57735027 0.8660254  0.57735027]

mean value: 0.6397632475200016

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78571429 0.71428571 0.92857143 0.78571429 0.92857143 0.85714286
 0.64285714 0.78571429 0.92857143 0.78571429]

mean value: 0.8142857142857143

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.76923077 0.71428571 0.93333333 0.8        0.93333333 0.875
 0.70588235 0.76923077 0.93333333 0.8       ]

mean value: 0.823362960568843

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.83333333 0.71428571 0.875      0.75       0.875      0.77777778
 0.6        0.83333333 0.875      0.75      ]

mean value: 0.7883730158730159

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.71428571 1.         0.85714286 1.         1.
 0.85714286 0.71428571 1.         0.85714286]

mean value: 0.8714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78571429 0.71428571 0.92857143 0.78571429 0.92857143 0.85714286
 0.64285714 0.78571429 0.92857143 0.78571429]

mean value: 0.8142857142857144

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.625      0.55555556 0.875      0.66666667 0.875      0.77777778
 0.54545455 0.625      0.875      0.66666667]

mean value: 0.7087121212121212

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01565766 0.01219702 0.01145101 0.01032972 0.01085424 0.01028109
 0.01057148 0.01089954 0.01067233 0.01110506]

mean value: 0.011401915550231933

key: score_time
value: [0.01228023 0.00890565 0.00876069 0.00860834 0.0085938  0.00857997
 0.0085113  0.00854468 0.00855422 0.00840044]

mean value: 0.008973932266235352

key: test_mcc
value: [0.8660254  0.71428571 1.         0.8660254  1.         0.8660254
 0.8660254  0.8660254  1.         1.        ]

mean value: 0.9044412733207908

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 0.92857143 0.92857143 1.         1.        ]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.85714286 1.         0.92307692 1.         0.93333333
 0.93333333 0.93333333 1.         1.        ]

mean value: 0.9513553113553114

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.85714286 1.         1.         1.         0.875
 0.875      0.875      1.         1.        ]

mean value: 0.9357142857142857

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 0.92857143 0.92857143 1.         1.        ]

mean value: 0.9500000000000001

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.75       1.         0.85714286 1.         0.875
 0.875      0.875      1.         1.        ]

mean value: 0.9107142857142857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.67

Accuracy on Blind test: 0.8

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08541059 0.08450389 0.08509231 0.08471632 0.08483028 0.08506131
 0.09252977 0.09162402 0.09251976 0.08942318]

mean value: 0.0875711441040039

key: score_time
value: [0.01690626 0.01708364 0.01702738 0.01695395 0.01703334 0.01681542
 0.01859069 0.01716471 0.01851535 0.01865768]

mean value: 0.017474842071533204

key: test_mcc
value: [0.74535599 0.57735027 0.8660254  0.63245553 0.71428571 0.74535599
 0.28867513 0.63245553 1.         0.8660254 ]

mean value: 0.7067984974706242

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.78571429 0.92857143 0.78571429 0.85714286 0.85714286
 0.64285714 0.78571429 1.         0.92857143]

mean value: 0.8428571428571429

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.8        0.93333333 0.82352941 0.85714286 0.875
 0.66666667 0.72727273 1.         0.92307692]

mean value: 0.8439355252590547

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.75       0.875      0.7        0.85714286 0.77777778
 0.625      1.         1.         1.        ]

mean value: 0.8584920634920635

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 1.         1.         0.85714286 1.
 0.71428571 0.57142857 1.         0.85714286]

mean value: 0.8571428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.78571429 0.92857143 0.78571429 0.85714286 0.85714286
 0.64285714 0.78571429 1.         0.92857143]

mean value: 0.8428571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.66666667 0.875      0.7        0.75       0.77777778
 0.5        0.57142857 1.         0.85714286]

mean value: 0.7412301587301587

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00889421 0.00980115 0.00881743 0.00876904 0.00982475 0.00965667
 0.00928593 0.00959969 0.00983977 0.00961208]

mean value: 0.00941007137298584

key: score_time
value: [0.00972748 0.00937581 0.00877976 0.00850892 0.00884318 0.00931907
 0.00854588 0.00929189 0.00882339 0.00881791]

mean value: 0.009003329277038574

key: test_mcc
value: [0.42857143 0.2773501  1.         0.63245553 0.4472136  0.63245553
 0.14285714 0.71428571 0.71428571 0.57735027]

mean value: 0.556682502686955

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71428571 0.57142857 1.         0.78571429 0.71428571 0.78571429
 0.57142857 0.85714286 0.85714286 0.78571429]

mean value: 0.7642857142857142

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.71428571 0.7        1.         0.82352941 0.75       0.82352941
 0.57142857 0.85714286 0.85714286 0.8       ]

mean value: 0.7897058823529411

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.53846154 1.         0.7        0.66666667 0.7
 0.57142857 0.85714286 0.85714286 0.75      ]

mean value: 0.7355128205128205

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 1.         1.         1.         0.85714286 1.
 0.57142857 0.85714286 0.85714286 0.85714286]

mean value: 0.8714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71428571 0.57142857 1.         0.78571429 0.71428571 0.78571429
 0.57142857 0.85714286 0.85714286 0.78571429]

mean value: 0.7642857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.55555556 0.53846154 1.         0.7        0.6        0.7
 0.4        0.75       0.75       0.66666667]

mean value: 0.6660683760683761

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.36

Accuracy on Blind test: 0.7

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.14242721 1.15457511 1.12844539 1.08648467 1.08412099 1.09782338
 1.10941672 1.12080956 1.16718316 1.16564441]

mean value: 1.1256930589675904

key: score_time
value: [0.0904541  0.09107041 0.08707428 0.08665013 0.08670735 0.09008956
 0.09070063 0.09385657 0.09427905 0.09441352]

mean value: 0.09052956104278564

key: test_mcc
value: [0.74535599 0.4472136  1.         0.63245553 0.8660254  0.74535599
 0.63245553 1.         1.         0.8660254 ]

mean value: 0.7934887452136047

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.71428571 1.         0.78571429 0.92857143 0.85714286
 0.78571429 1.         1.         0.92857143]

mean value: 0.8857142857142857

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.75       1.         0.82352941 0.93333333 0.875
 0.82352941 1.         1.         0.92307692]

mean value: 0.8961802413273001

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.66666667 1.         0.7        0.875      0.77777778
 0.7        1.         1.         1.        ]

mean value: 0.8719444444444444

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 1.         1.         1.         1.
 1.         1.         1.         0.85714286]

mean value: 0.9428571428571428

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.71428571 1.         0.78571429 0.92857143 0.85714286
 0.78571429 1.         1.         0.92857143]

mean value: 0.8857142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

value: [0.71428571 0.6        1.         0.7        0.875      0.77777778
 0.7        1.         1.         0.85714286]

mean value: 0.8224206349206349

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.8670609  0.91074848 0.91317868 0.92834425 0.89048648 0.88200378
 0.89488196 0.90182328 0.87973428 0.86373258]

mean value: 0.8931994676589966

key: score_time
value: [0.22280693 0.19019127 0.17825437 0.22623491 0.11388898 0.20483541
 0.16181397 0.23762107 0.23933649 0.2001493 ]

mean value: 0.19751327037811278

key: test_mcc
value: [0.74535599 0.57735027 1.         0.63245553 0.8660254  0.74535599
 0.74535599 0.8660254  0.74535599 0.8660254 ]

mean value: 0.7789305982576338

key: train_mcc
value: [0.98425098 1.         0.98425098 0.98425098 0.98425098 0.96825397
 0.98425098 0.96825397 0.96825397 0.98425098]

mean value: 0.9810267810270763

key: test_accuracy
value: [0.85714286 0.78571429 1.         0.78571429 0.92857143 0.85714286
 0.85714286 0.92857143 0.85714286 0.92857143]

mean value: 0.8785714285714286

key: train_accuracy
value: [0.99206349 1.         0.99206349 0.99206349 0.99206349 0.98412698
 0.99206349 0.98412698 0.98412698 0.99206349]

mean value: 0.9904761904761905

key: test_fscore
value: [0.83333333 0.8        1.         0.82352941 0.93333333 0.875
 0.875      0.93333333 0.875      0.92307692]

mean value: 0.8871606334841629

key: train_fscore
value: [0.992      1.         0.992      0.992      0.992      0.98412698
 0.992      0.98412698 0.98412698 0.992     ]

mean value: 0.9904380952380951

key: test_precision
value: [1.         0.75       1.         0.7        0.875      0.77777778
 0.77777778 0.875      0.77777778 1.        ]

mean value: 0.8533333333333333

key: train_precision
value: [1.         1.         1.         1.         1.         0.98412698
 1.         0.98412698 0.98412698 1.        ]

mean value: 0.9952380952380953

key: test_recall
value: [0.71428571 0.85714286 1.         1.         1.         1.
 1.         1.         1.         0.85714286]

mean value: 0.9428571428571428

key: train_recall
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9857142857142857

key: test_roc_auc
value: [0.85714286 0.78571429 1.         0.78571429 0.92857143 0.85714286
 0.85714286 0.92857143 0.85714286 0.92857143]

mean value: 0.8785714285714286

key: train_roc_auc
value: [0.99206349 1.         0.99206349 0.99206349 0.99206349 0.98412698
 0.99206349 0.98412698 0.98412698 0.99206349]

mean value: 0.9904761904761905

key: test_jcc
value: [0.71428571 0.66666667 1.         0.7        0.875      0.77777778
 0.77777778 0.875      0.77777778 0.85714286]

mean value: 0.8021428571428572

key: train_jcc
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.96875
 0.98412698 0.96875    0.96875    0.98412698]

mean value: 0.9811011904761905

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0122056  0.0089159  0.00880027 0.00890183 0.00872207 0.00890827
 0.00943804 0.00870562 0.00884724 0.00920796]

mean value: 0.009265279769897461

key: score_time
value: [0.01347136 0.00859499 0.00892377 0.00857043 0.00881839 0.00870991
 0.00861859 0.00867581 0.00872064 0.00905681]

mean value: 0.009216070175170898

key: test_mcc
value: [ 0.28867513 -0.1490712   0.4472136   0.31622777  0.4472136   0.8660254
  0.4472136   0.14285714  0.57735027  0.57735027]

mean value: 0.39610555736323716

key: train_mcc
value: [0.70276422 0.68811011 0.65610499 0.72345771 0.65610499 0.72011523
 0.67357531 0.76200076 0.70062273 0.71428571]

mean value: 0.6997141756428337

key: test_accuracy
value: [0.64285714 0.42857143 0.71428571 0.64285714 0.71428571 0.92857143
 0.71428571 0.57142857 0.78571429 0.78571429]

mean value: 0.6928571428571428

key: train_accuracy
value: [0.84920635 0.84126984 0.82539683 0.85714286 0.82539683 0.85714286
 0.83333333 0.88095238 0.84920635 0.85714286]

mean value: 0.8476190476190476

key: test_fscore
value: [0.61538462 0.33333333 0.75       0.70588235 0.66666667 0.93333333
 0.75       0.57142857 0.8        0.8       ]

mean value: 0.6926028873087696

key: train_fscore
value: [0.85714286 0.85074627 0.81355932 0.86764706 0.8358209  0.86567164
 0.84444444 0.88188976 0.85496183 0.85714286]

mean value: 0.8529026941398331

key: test_precision
value: [0.66666667 0.4        0.66666667 0.6        0.8        0.875
 0.66666667 0.57142857 0.75       0.75      ]

mean value: 0.6746428571428571

key: train_precision
value: [0.81428571 0.8028169  0.87272727 0.80821918 0.78873239 0.81690141
 0.79166667 0.875      0.82352941 0.85714286]

mean value: 0.825102180489476

key: test_recall
value: [0.57142857 0.28571429 0.85714286 0.85714286 0.57142857 1.
 0.85714286 0.57142857 0.85714286 0.85714286]

mean value: 0.7285714285714285

key: train_recall
value: [0.9047619  0.9047619  0.76190476 0.93650794 0.88888889 0.92063492
 0.9047619  0.88888889 0.88888889 0.85714286]

mean value: 0.8857142857142857

key: test_roc_auc
value: [0.64285714 0.42857143 0.71428571 0.64285714 0.71428571 0.92857143
 0.71428571 0.57142857 0.78571429 0.78571429]

mean value: 0.6928571428571428

key: train_roc_auc
value: [0.84920635 0.84126984 0.82539683 0.85714286 0.82539683 0.85714286
 0.83333333 0.88095238 0.84920635 0.85714286]

mean value: 0.8476190476190476

key: test_jcc
value: [0.44444444 0.2        0.6        0.54545455 0.5        0.875
 0.6        0.4        0.66666667 0.66666667]

mean value: 0.5498232323232323

key: train_jcc
value: [0.75       0.74025974 0.68571429 0.76623377 0.71794872 0.76315789
 0.73076923 0.78873239 0.74666667 0.75      ]

mean value: 0.7439482696695447

MCC on Blind test: 0.09

Accuracy on Blind test: 0.5

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.06620526 0.04390788 0.08739209 0.05115843 0.04113436 0.04162669
 0.05210567 0.04808497 0.05309319 0.05471087]

mean value: 0.0539419412612915

key: score_time
value: [0.01065469 0.0107336  0.01117682 0.01046491 0.01068664 0.01112556
 0.01121974 0.01058078 0.0113976  0.01165676]

mean value: 0.010969710350036622

key: test_mcc
value: [0.8660254  0.71428571 1.         0.8660254  1.         0.8660254
 0.8660254  0.8660254  1.         1.        ]

mean value: 0.9044412733207908

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 0.92857143 0.92857143 1.         1.        ]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.85714286 1.         0.92307692 1.         0.93333333
 0.93333333 0.93333333 1.         1.        ]

mean value: 0.9513553113553114

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.85714286 1.         1.         1.         0.875
 0.875      0.875      1.         1.        ]

mean value: 0.9357142857142857

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 0.92857143 0.92857143 1.         1.        ]

mean value: 0.9500000000000001

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.75       1.         0.85714286 1.         0.875
 0.875      0.875      1.         1.        ]

mean value: 0.9107142857142857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.02775335 0.05391788 0.02138972 0.04810762 0.04875565 0.0398221
 0.04783416 0.04795527 0.04840779 0.02121758]

mean value: 0.040516114234924315

key: score_time
value: [0.02279043 0.01194072 0.01184464 0.02045107 0.0215559  0.01637101
 0.0214467  0.01568127 0.02402854 0.01179504]

mean value: 0.01779053211212158

key: test_mcc
value: [ 0.42857143 -0.31622777  0.4472136   0.4472136   0.17407766  0.57735027
  0.71428571  0.          0.63245553  0.52223297]

mean value: 0.3627172992886314

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71428571 0.35714286 0.71428571 0.71428571 0.57142857 0.78571429
 0.85714286 0.5        0.78571429 0.71428571]

mean value: 0.6714285714285714

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.71428571 0.47058824 0.75       0.75       0.66666667 0.76923077
 0.85714286 0.53333333 0.82352941 0.77777778]

mean value: 0.7112554765495942

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.4        0.66666667 0.66666667 0.54545455 0.83333333
 0.85714286 0.5        0.7        0.63636364]

mean value: 0.651991341991342

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.57142857 0.85714286 0.85714286 0.85714286 0.71428571
 0.85714286 0.57142857 1.         1.        ]

mean value: 0.7999999999999999

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71428571 0.35714286 0.71428571 0.71428571 0.57142857 0.78571429
 0.85714286 0.5        0.78571429 0.71428571]

mean value: 0.6714285714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.55555556 0.30769231 0.6        0.6        0.5        0.625
 0.75       0.36363636 0.7        0.63636364]

mean value: 0.5638247863247863

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01432347 0.0088129  0.00857687 0.01004863 0.0096302  0.00874901
 0.00938463 0.00930548 0.00913095 0.00847912]

mean value: 0.009644126892089844

key: score_time
value: [0.00970125 0.00875878 0.00872755 0.00919008 0.00907755 0.0087111
 0.00946212 0.00932574 0.00867939 0.00881839]

mean value: 0.00904519557952881

key: test_mcc
value: [ 0.1490712  -0.31622777  0.1490712   0.          0.28867513  0.71428571
  0.4472136   0.          0.31622777  0.4472136 ]

mean value: 0.2195530436880415

key: train_mcc
value: [0.42943789 0.39762767 0.38100038 0.45137812 0.4612481  0.42943789
 0.38332594 0.5095438  0.42943789 0.39762767]

mean value: 0.42700653469334915

key: test_accuracy
value: [0.57142857 0.35714286 0.57142857 0.5        0.64285714 0.85714286
 0.71428571 0.5        0.64285714 0.71428571]

mean value: 0.6071428571428571

key: train_accuracy
value: [0.71428571 0.6984127  0.69047619 0.72222222 0.73015873 0.71428571
 0.69047619 0.75396825 0.71428571 0.6984127 ]

mean value: 0.7126984126984127

key: test_fscore
value: [0.5        0.47058824 0.625      0.53333333 0.61538462 0.85714286
 0.75       0.22222222 0.54545455 0.75      ]

mean value: 0.5869125808831691

key: train_fscore
value: [0.72307692 0.70769231 0.69291339 0.74452555 0.73846154 0.72307692
 0.70676692 0.74380165 0.72307692 0.70769231]

mean value: 0.7211084426534746

key: test_precision
value: [0.6        0.4        0.55555556 0.5        0.66666667 0.85714286
 0.66666667 0.5        0.75       0.66666667]

mean value: 0.6162698412698413

key: train_precision
value: [0.70149254 0.68656716 0.6875     0.68918919 0.71641791 0.70149254
 0.67142857 0.77586207 0.70149254 0.68656716]

mean value: 0.7018009680329547

key: test_recall
value: [0.42857143 0.57142857 0.71428571 0.57142857 0.57142857 0.85714286
 0.85714286 0.14285714 0.42857143 0.85714286]

mean value: 0.6

key: train_recall
value: [0.74603175 0.73015873 0.6984127  0.80952381 0.76190476 0.74603175
 0.74603175 0.71428571 0.74603175 0.73015873]

mean value: 0.7428571428571429

key: test_roc_auc
value: [0.57142857 0.35714286 0.57142857 0.5        0.64285714 0.85714286
 0.71428571 0.5        0.64285714 0.71428571]

mean value: 0.6071428571428572

key: train_roc_auc
value: [0.71428571 0.6984127  0.69047619 0.72222222 0.73015873 0.71428571
 0.69047619 0.75396825 0.71428571 0.6984127 ]

mean value: 0.7126984126984127

key: test_jcc
value: [0.33333333 0.30769231 0.45454545 0.36363636 0.44444444 0.75
 0.6        0.125      0.375      0.6       ]

mean value: 0.43536519036519034

key: train_jcc
value: [0.56626506 0.54761905 0.53012048 0.59302326 0.58536585 0.56626506
 0.54651163 0.59210526 0.56626506 0.54761905]

mean value: 0.564115975842606

MCC on Blind test: 0.82

Accuracy on Blind test: 0.9

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01048803 0.01484013 0.01470304 0.01385427 0.01441598 0.01442957
 0.01361561 0.01406598 0.01326013 0.01453805]

mean value: 0.013821077346801759

key: score_time
value: [0.00835991 0.01159859 0.01156473 0.01146817 0.01151705 0.01143265
 0.01151991 0.01145792 0.01147008 0.01149678]

mean value: 0.011188578605651856

key: test_mcc
value: [0.74535599 0.28867513 0.63245553 0.52223297 0.8660254  0.74535599
 0.71428571 0.71428571 0.71428571 0.74535599]

mean value: 0.6688314158636953

key: train_mcc
value: [0.9369802  0.98425098 0.96825397 0.73251987 0.9369802  0.86248336
 0.95346259 0.9369802  0.84813571 0.79772404]

mean value: 0.8957771140682494

key: test_accuracy
value: [0.85714286 0.64285714 0.78571429 0.71428571 0.92857143 0.85714286
 0.85714286 0.85714286 0.85714286 0.85714286]

mean value: 0.8214285714285714

key: train_accuracy
value: [0.96825397 0.99206349 0.98412698 0.84920635 0.96825397 0.92857143
 0.97619048 0.96825397 0.92063492 0.88888889]

mean value: 0.9444444444444444

key: test_fscore
value: [0.83333333 0.66666667 0.82352941 0.77777778 0.93333333 0.875
 0.85714286 0.85714286 0.85714286 0.875     ]

mean value: 0.8356069094304388

key: train_fscore
value: [0.96774194 0.99212598 0.98412698 0.86896552 0.96875    0.93233083
 0.97560976 0.96774194 0.91525424 0.9       ]

mean value: 0.9472647177041439

key: test_precision
value: [1.         0.625      0.7        0.63636364 0.875      0.77777778
 0.85714286 0.85714286 0.85714286 0.77777778]

mean value: 0.7963347763347763

key: train_precision
value: [0.98360656 0.984375   0.98412698 0.76829268 0.95384615 0.88571429
 1.         0.98360656 0.98181818 0.81818182]

mean value: 0.9343568221368351

key: test_recall
value: [0.71428571 0.71428571 1.         1.         1.         1.
 0.85714286 0.85714286 0.85714286 1.        ]

mean value: 0.9

key: train_recall
value: [0.95238095 1.         0.98412698 1.         0.98412698 0.98412698
 0.95238095 0.95238095 0.85714286 1.        ]

mean value: 0.9666666666666667

key: test_roc_auc
value: [0.85714286 0.64285714 0.78571429 0.71428571 0.92857143 0.85714286
 0.85714286 0.85714286 0.85714286 0.85714286]

mean value: 0.8214285714285715

key: train_roc_auc
value: [0.96825397 0.99206349 0.98412698 0.84920635 0.96825397 0.92857143
 0.97619048 0.96825397 0.92063492 0.88888889]

mean value: 0.9444444444444444

key: test_jcc
value: [0.71428571 0.5        0.7        0.63636364 0.875      0.77777778
 0.75       0.75       0.75       0.77777778]

mean value: 0.7231204906204907

key: train_jcc
value: [0.9375     0.984375   0.96875    0.76829268 0.93939394 0.87323944
 0.95238095 0.9375     0.84375    0.81818182]

mean value: 0.9023363829503257

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01333761 0.01313019 0.01358414 0.01327586 0.01317477 0.01350617
 0.01306081 0.01340628 0.0130887  0.0129261 ]

mean value: 0.013249063491821289

key: score_time
value: [0.01067972 0.01146817 0.01150608 0.01141    0.01147485 0.01149297
 0.0115993  0.01143336 0.01155019 0.01152277]

mean value: 0.01141374111175537

key: test_mcc
value: [0.74535599 0.4472136  0.57735027 0.4472136  0.8660254  0.40824829
 0.52223297 1.         0.31622777 0.8660254 ]

mean value: 0.6195893284606143

key: train_mcc
value: [0.90659109 1.         0.92354815 0.69451634 0.95250095 0.43437224
 1.         0.96825397 0.81110711 0.9216805 ]

mean value: 0.8612570347645649

key: test_accuracy
value: [0.85714286 0.71428571 0.78571429 0.71428571 0.92857143 0.64285714
 0.71428571 1.         0.64285714 0.92857143]

mean value: 0.7928571428571429

key: train_accuracy
value: [0.95238095 1.         0.96031746 0.82539683 0.97619048 0.65873016
 1.         0.98412698 0.8968254  0.96031746]

mean value: 0.9214285714285714

key: test_fscore
value: [0.83333333 0.75       0.8        0.66666667 0.92307692 0.44444444
 0.77777778 1.         0.54545455 0.93333333]

mean value: 0.7674087024087024

key: train_fscore
value: [0.95384615 1.         0.95867769 0.78846154 0.976      0.48192771
 1.         0.98412698 0.88495575 0.96124031]

mean value: 0.8989236135518371

key: test_precision
value: [1.         0.66666667 0.75       0.8        1.         1.
 0.63636364 1.         0.75       0.875     ]

mean value: 0.8478030303030303

key: train_precision
value: [0.92537313 1.         1.         1.         0.98387097 1.
 1.         0.98412698 1.         0.93939394]

mean value: 0.9832765025591217

key: test_recall
value: [0.71428571 0.85714286 0.85714286 0.57142857 0.85714286 0.28571429
 1.         1.         0.42857143 1.        ]

mean value: 0.7571428571428571

key: train_recall
value: [0.98412698 1.         0.92063492 0.65079365 0.96825397 0.31746032
 1.         0.98412698 0.79365079 0.98412698]

mean value: 0.8603174603174603

key: test_roc_auc
value: [0.85714286 0.71428571 0.78571429 0.71428571 0.92857143 0.64285714
 0.71428571 1.         0.64285714 0.92857143]

mean value: 0.7928571428571429

key: train_roc_auc
value: [0.95238095 1.         0.96031746 0.82539683 0.97619048 0.65873016
 1.         0.98412698 0.8968254  0.96031746]

mean value: 0.9214285714285715

key: test_jcc
value: [0.71428571 0.6        0.66666667 0.5        0.85714286 0.28571429
 0.63636364 1.         0.375      0.875     ]

mean value: 0.651017316017316

key: train_jcc
value: [0.91176471 1.         0.92063492 0.65079365 0.953125   0.31746032
 1.         0.96875    0.79365079 0.92537313]

mean value: 0.8441552522750394

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10463905 0.09315062 0.09086013 0.09352469 0.09143782 0.0925982
 0.09318709 0.09407496 0.09470487 0.09759974]

mean value: 0.0945777177810669

key: score_time
value: [0.01460838 0.015522   0.01450491 0.0146203  0.01513147 0.01525116
 0.01474357 0.01484275 0.01491022 0.01478457]

mean value: 0.014891934394836426

key: test_mcc
value: [0.8660254  0.57735027 1.         0.8660254  1.         1.
 0.8660254  0.8660254  1.         1.        ]

mean value: 0.9041451884327381

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 0.78571429 1.         0.92857143 1.         1.
 0.92857143 0.92857143 1.         1.        ]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.8        1.         0.92307692 1.         1.
 0.93333333 0.93333333 1.         1.        ]

mean value: 0.9523076923076923

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875 0.75  1.    1.    1.    1.    0.875 0.875 1.    1.   ]

mean value: 0.9375

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.78571429 1.         0.92857143 1.         1.
 0.92857143 0.92857143 1.         1.        ]

mean value: 0.9500000000000001

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.66666667 1.         0.85714286 1.         1.
 0.875      0.875      1.         1.        ]

mean value: 0.9148809523809524

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03174639 0.03029752 0.04592466 0.03525257 0.04778743 0.04697156
 0.03519297 0.03099775 0.03000784 0.03634834]

mean value: 0.037052702903747556

key: score_time
value: [0.02208447 0.01594448 0.02029872 0.02367043 0.02881885 0.02223229
 0.02290559 0.0230298  0.02333617 0.03644538]

mean value: 0.02387661933898926

key: test_mcc
value: [0.8660254  0.71428571 1.         0.8660254  1.         0.8660254
 1.         0.8660254  1.         1.        ]

mean value: 0.917838732942347

key: train_mcc
value: [0.98425098 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9984250984251476

key: test_accuracy
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 1.         0.92857143 1.         1.        ]

mean value: 0.9571428571428572

key: train_accuracy
value: [0.99206349 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9992063492063492

key: test_fscore
value: [0.93333333 0.85714286 1.         0.92307692 1.         0.93333333
 1.         0.93333333 1.         1.        ]

mean value: 0.958021978021978

key: train_fscore
value: [0.99212598 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9992125984251968

key: test_precision
value: [0.875      0.85714286 1.         1.         1.         0.875
 1.         0.875      1.         1.        ]

mean value: 0.9482142857142857

key: train_precision
value: [0.984375 1.       1.       1.       1.       1.       1.       1.
 1.       1.      ]

mean value: 0.9984375

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 1.         0.92857143 1.         1.        ]

mean value: 0.9571428571428572

key: train_roc_auc
value: [0.99206349 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9992063492063492

key: test_jcc
value: [0.875      0.75       1.         0.85714286 1.         0.875
 1.         0.875      1.         1.        ]

mean value: 0.9232142857142858

key: train_jcc
value: [0.984375 1.       1.       1.       1.       1.       1.       1.
 1.       1.      ]

mean value: 0.9984375

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.01603222 0.018996   0.01812768 0.01927161 0.04491544 0.03093719
 0.01937819 0.02009773 0.04497838 0.04498601]

mean value: 0.027772045135498045

key: score_time
value: [0.01200271 0.01204848 0.01198554 0.01204681 0.01694942 0.01251435
 0.01236963 0.01236773 0.02137852 0.02063894]

mean value: 0.01443021297454834

key: test_mcc
value: [0.74535599 0.         0.74535599 0.74535599 0.8660254  0.57735027
 0.52223297 0.57735027 0.74535599 0.57735027]

mean value: 0.6101733149220129

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.5        0.85714286 0.85714286 0.92857143 0.78571429
 0.71428571 0.78571429 0.85714286 0.78571429]

mean value: 0.7928571428571428

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.875      0.46153846 0.875      0.875      0.93333333 0.8
 0.77777778 0.76923077 0.875      0.8       ]

mean value: 0.8041880341880342

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.77777778 0.5        0.77777778 0.77777778 0.875      0.75
 0.63636364 0.83333333 0.77777778 0.75      ]

mean value: 0.7455808080808081

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.42857143 1.         1.         1.         0.85714286
 1.         0.71428571 1.         0.85714286]

mean value: 0.8857142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.5        0.85714286 0.85714286 0.92857143 0.78571429
 0.71428571 0.78571429 0.85714286 0.78571429]

mean value: 0.7928571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.77777778 0.3        0.77777778 0.77777778 0.875      0.66666667
 0.63636364 0.625      0.77777778 0.66666667]

mean value: 0.6880808080808081

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.25

Accuracy on Blind test: 0.6

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.23829675 0.24501896 0.24500442 0.21486497 0.25001359 0.24446774
 0.24729133 0.25173736 0.25330758 0.24612641]

mean value: 0.2436129093170166

key: score_time
value: [0.01009297 0.00921941 0.00917554 0.00900936 0.00912881 0.00920177
 0.00989652 0.00910735 0.00983119 0.00928211]

mean value: 0.009394502639770508

key: test_mcc
value: [0.8660254  0.71428571 1.         0.8660254  1.         0.8660254
 0.8660254  0.8660254  1.         1.        ]

mean value: 0.9044412733207908

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 0.92857143 0.92857143 1.         1.        ]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93333333 0.85714286 1.         0.92307692 1.         0.93333333
 0.93333333 0.93333333 1.         1.        ]

mean value: 0.9513553113553114

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.85714286 1.         1.         1.         0.875
 0.875      0.875      1.         1.        ]

mean value: 0.9357142857142857

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92857143 0.85714286 1.         0.92857143 1.         0.92857143
 0.92857143 0.92857143 1.         1.        ]

mean value: 0.9500000000000001

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.875      0.75       1.         0.85714286 1.         0.875
 0.875      0.875      1.         1.        ]

mean value: 0.9107142857142857

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.0148778  0.02256608 0.01641893 0.01648545 0.01651692 0.01666284
 0.01663566 0.01669383 0.01656461 0.01718378]

mean value: 0.01706058979034424

key: score_time
value: [0.0122633  0.01214314 0.01200151 0.01435137 0.01409698 0.01446939
 0.01430678 0.0145371  0.01454878 0.01470637]

mean value: 0.013742470741271972

key: test_mcc
value: [0.74535599 0.52223297 0.74535599 0.74535599 0.74535599 0.74535599
 0.63245553 0.63245553 0.8660254  0.74535599]

mean value: 0.7125305390718464

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.71428571 0.85714286 0.85714286 0.85714286 0.85714286
 0.78571429 0.78571429 0.92857143 0.85714286]

mean value: 0.8357142857142856

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.6        0.83333333 0.83333333 0.83333333 0.83333333
 0.72727273 0.72727273 0.92307692 0.83333333]

mean value: 0.7977622377622378

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.42857143 0.71428571 0.71428571 0.71428571 0.71428571
 0.57142857 0.57142857 0.85714286 0.71428571]

mean value: 0.6714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.71428571 0.85714286 0.85714286 0.85714286 0.85714286
 0.78571429 0.78571429 0.92857143 0.85714286]

mean value: 0.8357142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.42857143 0.71428571 0.71428571 0.71428571 0.71428571
 0.57142857 0.57142857 0.85714286 0.71428571]

mean value: 0.6714285714285714

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.6

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02142644 0.01297259 0.01400614 0.01322484 0.02798653 0.03346753
 0.03264356 0.03279185 0.0330112  0.03324366]

mean value: 0.02547743320465088

key: score_time
value: [0.01180696 0.01169729 0.01171327 0.01168156 0.02339411 0.02246428
 0.02254462 0.02227879 0.02086663 0.0208807 ]

mean value: 0.017932820320129394

key: test_mcc
value: [0.57735027 0.4472136  0.8660254  0.57735027 0.8660254  0.74535599
 0.4472136  0.42857143 1.         0.8660254 ]

mean value: 0.6821131361803842

key: train_mcc
value: [0.96825397 0.98425098 0.95250095 0.96825397 0.96825397 0.95250095
 0.98425098 0.96825397 0.96825397 0.96825397]

mean value: 0.9683027683029619

key: test_accuracy
value: [0.78571429 0.71428571 0.92857143 0.78571429 0.92857143 0.85714286
 0.71428571 0.71428571 1.         0.92857143]

mean value: 0.8357142857142857

key: train_accuracy
value: [0.98412698 0.99206349 0.97619048 0.98412698 0.98412698 0.97619048
 0.99206349 0.98412698 0.98412698 0.98412698]

mean value: 0.9841269841269841

key: test_fscore
value: [0.76923077 0.75       0.93333333 0.8        0.93333333 0.875
 0.75       0.71428571 1.         0.93333333]

mean value: 0.8458516483516484

key: train_fscore
value: [0.98412698 0.99212598 0.97637795 0.98412698 0.98412698 0.97637795
 0.992      0.98412698 0.98412698 0.98412698]

mean value: 0.9841643794525684

key: test_precision
value: [0.83333333 0.66666667 0.875      0.75       0.875      0.77777778
 0.66666667 0.71428571 1.         0.875     ]

mean value: 0.8033730158730159

key: train_precision
value: [0.98412698 0.984375   0.96875    0.98412698 0.98412698 0.96875
 1.         0.98412698 0.98412698 0.98412698]

mean value: 0.9826636904761904

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 1.         1.
 0.85714286 0.71428571 1.         1.        ]

mean value: 0.9

key: train_recall
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value:/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:148: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:151: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
 0.9857142857142857

key: test_roc_auc
value: [0.78571429 0.71428571 0.92857143 0.78571429 0.92857143 0.85714286
 0.71428571 0.71428571 1.         0.92857143]

mean value: 0.8357142857142857

key: train_roc_auc
value: [0.98412698 0.99206349 0.97619048 0.98412698 0.98412698 0.97619048
 0.99206349 0.98412698 0.98412698 0.98412698]

mean value: 0.9841269841269842

key: test_jcc
value: [0.625      0.6        0.875      0.66666667 0.875      0.77777778
 0.6        0.55555556 1.         0.875     ]

mean value: 0.745

key: train_jcc
value: [0.96875    0.984375   0.95384615 0.96875    0.96875    0.95384615
 0.98412698 0.96875    0.96875    0.96875   ]

mean value: 0.9688694291819292

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.10278153 0.20039511 0.21716428 0.104352   0.16163802 0.16399074
 0.20072842 0.2223134  0.3059268  0.212677  ]

mean value: 0.18919672966003417

key: score_time
value: [0.02314091 0.02282786 0.02277565 0.01210189 0.01189661 0.02072382
 0.02222943 0.02230954 0.01674795 0.02327585]

mean value: 0.019802951812744142

key: test_mcc
value: [0.57735027 0.4472136  0.8660254  0.57735027 0.8660254  0.74535599
 0.4472136  0.42857143 0.74535599 0.8660254 ]

mean value: 0.6566487354303772

key: train_mcc
value: [0.96825397 0.98425098 0.95250095 0.96825397 0.96825397 0.95250095
 0.98425098 0.96825397 0.98425098 0.96825397]

mean value: 0.9699024699027128

key: test_accuracy
value: [0.78571429 0.71428571 0.92857143 0.78571429 0.92857143 0.85714286
 0.71428571 0.71428571 0.85714286 0.92857143]

mean value: 0.8214285714285714

key: train_accuracy
value: [0.98412698 0.99206349 0.97619048 0.98412698 0.98412698 0.97619048
 0.99206349 0.98412698 0.99206349 0.98412698]

mean value: 0.9849206349206349

key: test_fscore
value: [0.76923077 0.75       0.93333333 0.8        0.93333333 0.875
 0.75       0.71428571 0.875      0.93333333]

mean value: 0.8333516483516483

key: train_fscore
value: [0.98412698 0.99212598 0.97637795 0.98412698 0.98412698 0.97637795
 0.992      0.98412698 0.99212598 0.98412698]

mean value: 0.9849642794650668

key: test_precision
value: [0.83333333 0.66666667 0.875      0.75       0.875      0.77777778
 0.66666667 0.71428571 0.77777778 0.875     ]

mean value: 0.7811507936507937

key: train_precision
value: [0.98412698 0.984375   0.96875    0.98412698 0.98412698 0.96875
 1.         0.98412698 0.984375   0.98412698]

mean value: 0.9826884920634921

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 1.         1.
 0.85714286 0.71428571 1.         1.        ]

mean value: 0.9

key: train_recall
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.98412698 0.98412698 1.         0.98412698]

mean value: 0.9873015873015872

key: test_roc_auc
value: [0.78571429 0.71428571 0.92857143 0.78571429 0.92857143 0.85714286
 0.71428571 0.71428571 0.85714286 0.92857143]

mean value: 0.8214285714285715

key: train_roc_auc
value: [0.98412698 0.99206349 0.97619048 0.98412698 0.98412698 0.97619048
 0.99206349 0.98412698 0.99206349 0.98412698]

mean value: 0.984920634920635

key: test_jcc
value: [0.625      0.6        0.875      0.66666667 0.875      0.77777778
 0.6        0.55555556 0.77777778 0.875     ]

mean value: 0.7227777777777777

key: train_jcc
value: [0.96875    0.984375   0.95384615 0.96875    0.96875    0.95384615
 0.98412698 0.96875    0.984375   0.96875   ]

mean value: 0.9704319291819292

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02898908 0.02259541 0.02249479 0.02232766 0.02245951 0.02295685
 0.02357912 0.02160358 0.03005314 0.02351069]

mean value: 0.02405698299407959

key: score_time
value: [0.01279664 0.01184416 0.01169562 0.01201034 0.0119729  0.01210237
 0.01197553 0.02426434 0.01212287 0.01198077]

mean value: 0.0132765531539917

key: test_mcc
value: [1.         0.77459667 0.25819889 0.57735027 0.         0.77459667
 0.77459667 0.5        0.09128709 0.73029674]

mean value: 0.5480923002918986

key: train_mcc
value: [0.94285714 0.91465912 0.91465912 0.91465912 0.91766294 0.94285714
 0.97182532 0.94285714 0.91587302 0.88730159]

mean value: 0.9265211645315969

key: test_accuracy
value: [1.         0.875      0.625      0.75       0.5        0.875
 0.875      0.75       0.57142857 0.85714286]

mean value: 0.7678571428571428

key: train_accuracy
value: [0.97142857 0.95714286 0.95714286 0.95714286 0.95714286 0.97142857
 0.98571429 0.97142857 0.95774648 0.94366197]

mean value: 0.9629979879275654

key: test_fscore
value: [1.         0.85714286 0.57142857 0.8        0.5        0.85714286
 0.88888889 0.75       0.4        0.88888889]

mean value: 0.7513492063492063

key: train_fscore
value: [0.97142857 0.95774648 0.95652174 0.95652174 0.95522388 0.97142857
 0.98550725 0.97142857 0.95774648 0.94285714]

mean value: 0.9626410420124032

key: test_precision
value: [1.         1.         0.66666667 0.66666667 0.5        1.
 0.8        0.75       0.5        0.8       ]

mean value: 0.7683333333333333

key: train_precision
value: [0.97142857 0.94444444 0.97058824 0.97058824 1.         0.97142857
 1.         0.97142857 0.97142857 0.94285714]

mean value: 0.9714192343604108

key: test_recall
value: [1.         0.75       0.5        1.         0.5        0.75
 1.         0.75       0.33333333 1.        ]

mean value: 0.7583333333333333

key: train_recall
value: [0.97142857 0.97142857 0.94285714 0.94285714 0.91428571 0.97142857
 0.97142857 0.97142857 0.94444444 0.94285714]

mean value: 0.9544444444444444

key: test_roc_auc
value: [1.         0.875      0.625      0.75       0.5        0.875
 0.875      0.75       0.54166667 0.83333333]

mean value: 0.7625

key: train_roc_auc
value: [0.97142857 0.95714286 0.95714286 0.95714286 0.95714286 0.97142857
 0.98571429 0.97142857 0.95793651 0.94365079]

mean value: 0.9630158730158731

key: test_jcc
value: [1.         0.75       0.4        0.66666667 0.33333333 0.75
 0.8        0.6        0.25       0.8       ]

mean value: 0.635

key: train_jcc
value: [0.94444444 0.91891892 0.91666667 0.91666667 0.91428571 0.94444444
 0.97142857 0.94444444 0.91891892 0.89189189]

mean value: 0.9282110682110682

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.52006674 0.63721442 0.58264375 0.52532816 0.55562091 0.57994127
 0.5116756  0.49916434 0.53129792 0.70804286]

mean value: 0.5650995969772339

key: score_time
value: [0.01221442 0.01496315 0.01473856 0.01227617 0.01471972 0.01287651
 0.01455784 0.01221538 0.01225281 0.01236057]

mean value: 0.013317513465881347

key: test_mcc
value: [1.         0.5        0.25819889 0.25819889 0.25819889 0.77459667
 0.77459667 0.5        0.41666667 0.73029674]

mean value: 0.5470753417731338

key: train_mcc
value: [0.94285714 1.         1.         0.94285714 1.         0.94285714
 1.         0.94285714 0.97220047 1.        ]

mean value: 0.9743629036957027

key: test_accuracy
value: [1.         0.75       0.625      0.625      0.625      0.875
 0.875      0.75       0.71428571 0.85714286]

mean value: 0.7696428571428572

key: train_accuracy
value: [0.97142857 1.         1.         0.97142857 1.         0.97142857
 1.         0.97142857 0.98591549 1.        ]

mean value: 0.9871629778672032

key: test_fscore
value: [1.         0.75       0.57142857 0.66666667 0.66666667 0.85714286
 0.88888889 0.75       0.66666667 0.88888889]

mean value: 0.7706349206349206

key: train_fscore
value: [0.97142857 1.         1.         0.97142857 1.         0.97142857
 1.         0.97142857 0.98630137 1.        ]

mean value: 0.98720156555773

key: test_precision
value: [1.         0.75       0.66666667 0.6        0.6        1.
 0.8        0.75       0.66666667 0.8       ]

mean value: 0.7633333333333333

key: train_precision
value: [0.97142857 1.         1.         0.97142857 1.         0.97142857
 1.         0.97142857 0.97297297 1.        ]

mean value: 0.9858687258687259

key: test_recall
value: [1.         0.75       0.5        0.75       0.75       0.75
 1.         0.75       0.66666667 1.        ]

mean value: 0.7916666666666666

key: train_recall
value: [0.97142857 1.         1.         0.97142857 1.         0.97142857
 1.         0.97142857 1.         1.        ]

mean value: 0.9885714285714285

key: test_roc_auc
value: [1.         0.75       0.625      0.625      0.625      0.875
 0.875      0.75       0.70833333 0.83333333]

mean value: 0.7666666666666667

key: train_roc_auc
value: [0.97142857 1.         1.         0.97142857 1.         0.97142857
 1.         0.97142857 0.98571429 1.        ]

mean value: 0.9871428571428572

key: test_jcc
value: [1.   0.6  0.4  0.5  0.5  0.75 0.8  0.6  0.5  0.8 ]

mean value: 0.645

key: train_jcc
value: [0.94444444 1.         1.         0.94444444 1.         0.94444444
 1.         0.94444444 0.97297297 1.        ]

mean value: 0.9750750750750751

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01222157 0.01113677 0.00878263 0.00857139 0.00830865 0.00838494
 0.00841308 0.00853992 0.00893879 0.00886178]

mean value: 0.009215950965881348

key: score_time
value: [0.01200318 0.00914693 0.00892782 0.00853491 0.00864744 0.00857234
 0.00861883 0.00864673 0.00854182 0.00876498]

mean value: 0.009040498733520507

key: test_mcc
value: [ 0.25819889  0.25819889  0.25819889  0.         -0.37796447  0.77459667
 -0.25819889  0.         -0.09128709  0.73029674]

mean value: 0.15520396261492722

key: train_mcc
value: [0.62882815 0.48154341 0.57166195 0.69954392 0.60395717 0.80829038
 0.46188022 0.73370909 0.71961897 0.63383658]

mean value: 0.6342869837555468

key: test_accuracy
value: [0.625      0.625      0.625      0.5        0.375      0.875
 0.375      0.5        0.42857143 0.85714286]

mean value: 0.5785714285714285

key: train_accuracy
value: [0.81428571 0.72857143 0.78571429 0.82857143 0.8        0.9
 0.72857143 0.85714286 0.85915493 0.81690141]

mean value: 0.8118913480885311

key: test_fscore
value: [0.66666667 0.66666667 0.57142857 0.5        0.54545455 0.88888889
 0.44444444 0.33333333 0.5        0.88888889]

mean value: 0.6005772005772005

key: train_fscore
value: [0.8115942  0.7654321  0.78873239 0.79310345 0.78787879 0.89230769
 0.74666667 0.83870968 0.85714286 0.8115942 ]

mean value: 0.8093162028619951

key: test_precision
value: [0.6        0.6        0.66666667 0.5        0.42857143 0.8
 0.4        0.5        0.4        0.8       ]

mean value: 0.5695238095238095

key: train_precision
value: [0.82352941 0.67391304 0.77777778 1.         0.83870968 0.96666667
 0.7        0.96296296 0.88235294 0.82352941]

mean value: 0.8449441893010905

key: test_recall
value: [0.75       0.75       0.5        0.5        0.75       1.
 0.5        0.25       0.66666667 1.        ]

mean value: 0.6666666666666666

key: train_recall
value: [0.8        0.88571429 0.8        0.65714286 0.74285714 0.82857143
 0.8        0.74285714 0.83333333 0.8       ]

mean value: 0.7890476190476191

key: test_roc_auc
value: [0.625      0.625      0.625      0.5        0.375      0.875
 0.375      0.5        0.45833333 0.83333333]

mean value: 0.5791666666666667

key: train_roc_auc
value: [0.81428571 0.72857143 0.78571429 0.82857143 0.8        0.9
 0.72857143 0.85714286 0.85952381 0.81666667]

mean value: 0.8119047619047619

key: test_jcc
value: [0.5        0.5        0.4        0.33333333 0.375      0.8
 0.28571429 0.2        0.33333333 0.8       ]

mean value: 0.4527380952380953

key: train_jcc
value: [0.68292683 0.62       0.65116279 0.65714286 0.65       0.80555556
 0.59574468 0.72222222 0.75       0.68292683]

mean value: 0.6817681765005958

MCC on Blind test: 0.82

Accuracy on Blind test: 0.9

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0088594  0.00871515 0.00853276 0.00873947 0.00854588 0.00868845
 0.00872922 0.00873923 0.00860667 0.00853539]

mean value: 0.008669161796569824

key: score_time
value: [0.00854993 0.00863361 0.00863743 0.00856376 0.00851822 0.00865054
 0.00859189 0.00869846 0.00846481 0.00855732]

mean value: 0.008586597442626954

key: test_mcc
value: [ 0.          0.          0.          0.37796447  0.25819889  0.5
  0.          0.         -0.16666667  0.75      ]

mean value: 0.17194966960897218

key: train_mcc
value: [0.6350853  0.69282032 0.6614769  0.71545476 0.69282032 0.78301997
 0.63089327 0.69282032 0.67079854 0.60683101]

mean value: 0.6782020711253509

key: test_accuracy
value: [0.5        0.5        0.5        0.625      0.625      0.75
 0.5        0.5        0.42857143 0.85714286]

mean value: 0.5785714285714285

key: train_accuracy
value: [0.81428571 0.84285714 0.82857143 0.85714286 0.84285714 0.88571429
 0.81428571 0.84285714 0.83098592 0.8028169 ]

mean value: 0.8362374245472837

key: test_fscore
value: [0.6        0.6        0.33333333 0.72727273 0.66666667 0.75
 0.6        0.5        0.33333333 0.85714286]

mean value: 0.5967748917748917

key: train_fscore
value: [0.82666667 0.85333333 0.83783784 0.86111111 0.85333333 0.89473684
 0.82191781 0.85333333 0.84615385 0.80555556]

mean value: 0.8453979667649458

key: test_precision
value: [0.5        0.5        0.5        0.57142857 0.6        0.75
 0.5        0.5        0.33333333 1.        ]

mean value: 0.5754761904761905

key: train_precision
value: [0.775      0.8        0.79487179 0.83783784 0.8        0.82926829
 0.78947368 0.8        0.78571429 0.78378378]

mean value: 0.7995949679101155

key: test_recall
value: [0.75       0.75       0.25       1.         0.75       0.75
 0.75       0.5        0.33333333 0.75      ]

mean value: 0.6583333333333333

key: train_recall
value: [0.88571429 0.91428571 0.88571429 0.88571429 0.91428571 0.97142857
 0.85714286 0.91428571 0.91666667 0.82857143]

mean value: 0.8973809523809524

key: test_roc_auc
value: [0.5        0.5        0.5        0.625      0.625      0.75
 0.5        0.5        0.41666667 0.875     ]

mean value: 0.5791666666666666

key: train_roc_auc
value: [0.81428571 0.84285714 0.82857143 0.85714286 0.84285714 0.88571429
 0.81428571 0.84285714 0.8297619  0.8031746 ]

mean value: 0.8361507936507937

key: test_jcc
value: [0.42857143 0.42857143 0.2        0.57142857 0.5        0.6
 0.42857143 0.33333333 0.2        0.75      ]

mean value: 0.444047619047619

key: train_jcc
value: [0.70454545 0.74418605 0.72093023 0.75609756 0.74418605 0.80952381
 0.69767442 0.74418605 0.73333333 0.6744186 ]

mean value: 0.7329081553727044

MCC on Blind test: 0.09

Accuracy on Blind test: 0.5

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00916004 0.009094   0.0092001  0.00905657 0.00823712 0.00933886
 0.00938821 0.00961804 0.00989199 0.00914526]

mean value: 0.009213018417358398

key: score_time
value: [0.00972176 0.00999951 0.0101099  0.00988507 0.00962853 0.00964975
 0.01003718 0.00978231 0.00975752 0.00999331]

mean value: 0.009856486320495605

key: test_mcc
value: [0.25819889 0.25819889 0.25819889 0.25819889 0.25819889 0.5
 0.25819889 0.37796447 0.75       0.41666667]

mean value: 0.359382447815886

key: train_mcc
value: [0.43139798 0.57166195 0.54374562 0.45883147 0.54374562 0.48891771
 0.6        0.48650924 0.49285714 0.54972312]

mean value: 0.516738984896947

key: test_accuracy
value: [0.625      0.625      0.625      0.625      0.625      0.75
 0.625      0.625      0.85714286 0.71428571]

mean value: 0.6696428571428571

key: train_accuracy
value: [0.71428571 0.78571429 0.77142857 0.72857143 0.77142857 0.74285714
 0.8        0.74285714 0.74647887 0.77464789]

mean value: 0.7578269617706237

key: test_fscore
value: [0.66666667 0.66666667 0.57142857 0.57142857 0.57142857 0.75
 0.57142857 0.4        0.85714286 0.75      ]

mean value: 0.6376190476190476

key: train_fscore
value: [0.6969697  0.7826087  0.77777778 0.71641791 0.76470588 0.75675676
 0.8        0.73529412 0.75       0.76470588]

mean value: 0.7545236719957108

key: test_precision
value: [0.6        0.6        0.66666667 0.66666667 0.66666667 0.75
 0.66666667 1.         0.75       0.75      ]

mean value: 0.7116666666666667

key: train_precision
value: [0.74193548 0.79411765 0.75675676 0.75       0.78787879 0.71794872
 0.8        0.75757576 0.75       0.78787879]

mean value: 0.7644091938968599

key: test_recall
value: [0.75 0.75 0.5  0.5  0.5  0.75 0.5  0.25 1.   0.75]

mean value: 0.625

key: train_recall
value: [0.65714286 0.77142857 0.8        0.68571429 0.74285714 0.8
 0.8        0.71428571 0.75       0.74285714]

mean value: 0.7464285714285714

key: test_roc_auc
value: [0.625      0.625      0.625      0.625      0.625      0.75
 0.625      0.625      0.875      0.70833333]

mean value: 0.6708333333333334

key: train_roc_auc
value: [0.71428571 0.78571429 0.77142857 0.72857143 0.77142857 0.74285714
 0.8        0.74285714 0.74642857 0.77420635]

mean value: 0.7577777777777778

key: test_jcc
value: [0.5  0.5  0.4  0.4  0.4  0.6  0.4  0.25 0.75 0.6 ]

mean value: 0.48

key: train_jcc
value: [0.53488372 0.64285714 0.63636364 0.55813953 0.61904762 0.60869565
 0.66666667 0.58139535 0.6        0.61904762]

mean value: 0.606709694080776

MCC on Blind test: 0.17

Accuracy on Blind test: 0.6

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01008749 0.00999761 0.00995326 0.00994325 0.00952315 0.00990009
 0.00955081 0.00994372 0.00989914 0.00985479]

mean value: 0.009865331649780273

key: score_time
value: [0.0110445  0.00936389 0.00931382 0.0096035  0.00935316 0.00944185
 0.00971365 0.00931597 0.00903106 0.00951076]

mean value: 0.009569215774536132

key: test_mcc
value: [ 1.          0.5         0.          0.5        -0.25819889  0.37796447
  0.5         0.25819889  0.09128709  0.73029674]

mean value: 0.3699548309266976

key: train_mcc
value: [0.82857143 0.80032673 0.82992752 0.82992752 0.860309   0.8871639
 0.77651637 0.74316054 0.83214239 0.75346834]

mean value: 0.8141513733500897

key: test_accuracy
value: [1.         0.75       0.5        0.75       0.375      0.625
 0.75       0.625      0.57142857 0.85714286]

mean value: 0.6803571428571429

key: train_accuracy
value: [0.91428571 0.9        0.91428571 0.91428571 0.92857143 0.94285714
 0.88571429 0.87142857 0.91549296 0.87323944]

mean value: 0.9060160965794768

key: test_fscore
value: [1.         0.75       0.33333333 0.75       0.28571429 0.4
 0.75       0.57142857 0.4        0.88888889]

mean value: 0.612936507936508

key: train_fscore
value: [0.91428571 0.89855072 0.91176471 0.91176471 0.92537313 0.94117647
 0.89189189 0.86956522 0.91891892 0.86153846]

mean value: 0.9044829945345272

key: test_precision
value: [1.         0.75       0.5        0.75       0.33333333 1.
 0.75       0.66666667 0.5        0.8       ]

mean value: 0.705

key: train_precision
value: [0.91428571 0.91176471 0.93939394 0.93939394 0.96875    0.96969697
 0.84615385 0.88235294 0.89473684 0.93333333]

mean value: 0.9199862231421829

key: test_recall
value: [1.         0.75       0.25       0.75       0.25       0.25
 0.75       0.5        0.33333333 1.        ]

mean value: 0.5833333333333334

key: train_recall
value: [0.91428571 0.88571429 0.88571429 0.88571429 0.88571429 0.91428571
 0.94285714 0.85714286 0.94444444 0.8       ]

mean value: 0.8915873015873016

key: test_roc_auc
value: [1.         0.75       0.5        0.75       0.375      0.625
 0.75       0.625      0.54166667 0.83333333]

mean value: 0.675

key: train_roc_auc
value: [0.91428571 0.9        0.91428571 0.91428571 0.92857143 0.94285714
 0.88571429 0.87142857 0.91507937 0.87222222]

mean value: 0.9058730158730158

key: test_jcc
value: [1.         0.6        0.2        0.6        0.16666667 0.25
 0.6        0.4        0.25       0.8       ]

mean value: 0.4866666666666667

key: train_jcc
value: [0.84210526 0.81578947 0.83783784 0.83783784 0.86111111 0.88888889
 0.80487805 0.76923077 0.85       0.75675676]

mean value: 0.8264435987285794

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.42219353 0.38538766 0.37766552 0.38039589 0.32598734 0.59394097
 0.33555698 0.37964678 0.39589477 0.39897919]

mean value: 0.39956486225128174

key: score_time
value: [0.01248217 0.01235509 0.01225305 0.01246142 0.01255774 0.01329589
 0.0128777  0.01322794 0.0129385  0.01259947]

mean value: 0.012704896926879882

key: test_mcc
value: [ 0.5         0.77459667  0.5         0.5        -0.25819889  0.
  0.5         0.5         0.16666667  0.73029674]

mean value: 0.39133611895012105

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.875      0.75       0.75       0.375      0.5
 0.75       0.75       0.57142857 0.85714286]

mean value: 0.6928571428571428

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.85714286 0.75       0.75       0.28571429 0.5
 0.75       0.75       0.57142857 0.88888889]

mean value: 0.6853174603174603

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       1.         0.75       0.75       0.33333333 0.5
 0.75       0.75       0.5        0.8       ]

mean value: 0.6883333333333334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.75       0.75       0.75       0.25       0.5
 0.75       0.75       0.66666667 1.        ]

mean value: 0.6916666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.875      0.75       0.75       0.375      0.5
 0.75       0.75       0.58333333 0.83333333]

mean value: 0.6916666666666667

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.75       0.6        0.6        0.16666667 0.33333333
 0.6        0.6        0.4        0.8       ]

mean value: 0.545

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.5

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01383638 0.01134634 0.01063728 0.01032615 0.0097909  0.01033807
 0.01113963 0.01021552 0.01052022 0.01057673]

mean value: 0.010872721672058105

key: score_time
value: [0.01207805 0.00988507 0.00966096 0.00939751 0.00937295 0.00936341
 0.0089221  0.00934076 0.00945401 0.00952339]

mean value: 0.009699821472167969

key: test_mcc
value: [0.77459667 1.         0.77459667 1.         0.77459667 1.
 0.77459667 1.         1.         1.        ]

mean value: 0.9098386676965934

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875 1.    0.875 1.    0.875 1.    0.875 1.    1.    1.   ]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 1.         0.85714286 1.         0.85714286 1.
 0.85714286 1.         1.         1.        ]

mean value: 0.946031746031746

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8 1.  1.  1.  1.  1.  1.  1.  1.  1. ]

mean value: 0.98

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   1.   0.75 1.   0.75 1.   0.75 1.   1.   1.  ]

mean value: 0.925

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875 1.    0.875 1.    0.875 1.    0.875 1.    1.    1.   ]

mean value: 0.95

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8  1.   0.75 1.   0.75 1.   0.75 1.   1.   1.  ]

mean value: 0.905

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.09162545 0.0875473  0.08813334 0.08958077 0.0837934  0.08881235
 0.08904314 0.08884525 0.0853529  0.08809972]

mean value: 0.08808336257934571

key: score_time
value: [0.01862025 0.01851726 0.01855969 0.0186305  0.01841116 0.01861596
 0.01849103 0.01809096 0.01862431 0.01818585]

mean value: 0.018474698066711426

key: test_mcc
value: [ 0.77459667  1.          0.57735027  0.77459667  0.          0.
  0.77459667  0.25819889 -0.16666667  0.41666667]

mean value: 0.4409339166661237

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      1.         0.75       0.875      0.5        0.5
 0.875      0.625      0.42857143 0.71428571]

mean value: 0.7142857142857143

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 1.         0.66666667 0.88888889 0.5        0.5
 0.88888889 0.57142857 0.33333333 0.75      ]

mean value: 0.6988095238095239

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        1.         1.         0.8        0.5        0.5
 0.8        0.66666667 0.33333333 0.75      ]

mean value: 0.715

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.5        1.         0.5        0.5
 1.         0.5        0.33333333 0.75      ]

mean value: 0.7083333333333334

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      1.         0.75       0.875      0.5        0.5
 0.875      0.625      0.41666667 0.70833333]

mean value: 0.7125

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        1.         0.5        0.8        0.33333333 0.33333333
 0.8        0.4        0.2        0.6       ]

mean value: 0.5766666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.0095346  0.00932455 0.00948405 0.00941014 0.00933146 0.00964332
 0.00960994 0.00947642 0.0096693  0.00891423]

mean value: 0.009439802169799805

key: score_time
value: [0.00931406 0.0092082  0.00925589 0.00923014 0.00934744 0.00957775
 0.00921679 0.00915599 0.00987196 0.00928354]

mean value: 0.00934617519378662

key: test_mcc
value: [ 0.25819889  0.          0.          0.57735027  0.         -0.25819889
 -0.25819889  0.25819889  0.41666667  0.75      ]

mean value: 0.17440169358562926

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.625      0.5        0.5        0.75       0.5        0.375
 0.375      0.625      0.71428571 0.85714286]

mean value: 0.5821428571428572

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.6        0.5        0.8        0.5        0.44444444
 0.28571429 0.57142857 0.66666667 0.85714286]

mean value: 0.5892063492063492

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.6        0.5        0.5        0.66666667 0.5        0.4
 0.33333333 0.66666667 0.66666667 1.        ]

mean value: 0.5833333333333334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75       0.75       0.5        1.         0.5        0.5
 0.25       0.5        0.66666667 0.75      ]

mean value: 0.6166666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.625      0.5        0.5        0.75       0.5        0.375
 0.375      0.625      0.70833333 0.875     ]

mean value: 0.5833333333333334

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.42857143 0.33333333 0.66666667 0.33333333 0.28571429
 0.16666667 0.4        0.5        0.75      ]

mean value: 0.43642857142857144

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.5

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.06088781 1.07806635 1.07123256 1.10185742 1.08816147 1.07904243
 1.12222743 1.07182336 1.07926655 1.04409885]

mean value: 1.0796664237976075

key: score_time
value: [0.09509015 0.09501624 0.09373927 0.09523463 0.09523296 0.09541678
 0.09467912 0.09589577 0.09356499 0.09200788]

mean value: 0.09458777904510499

key: test_mcc
value: [0.77459667 0.77459667 0.77459667 0.77459667 0.         0.
 0.5        0.25819889 0.09128709 0.73029674]

mean value: 0.46781694029708437

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      0.875      0.875      0.875      0.5        0.5
 0.75       0.625      0.57142857 0.85714286]

mean value: 0.7303571428571428

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.88888889 0.85714286 0.88888889 0.5        0.5
 0.75       0.57142857 0.4        0.88888889]

mean value: 0.7134126984126985

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8        0.8        1.         0.8        0.5        0.5
 0.75       0.66666667 0.5        0.8       ]

mean value: 0.7116666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.75       1.         0.5        0.5
 0.75       0.5        0.33333333 1.        ]

mean value: 0.7333333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.875      0.875      0.875      0.5        0.5
 0.75       0.625      0.54166667 0.83333333]

mean value: 0.725

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8        0.8        0.75       0.8        0.33333333 0.33333333
 0.6        0.4        0.25       0.8       ]

mean value: 0.5866666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.79908037 0.86214375 0.85928059 0.828789   0.82958341 0.90800166
 0.87632132 0.92371035 0.8799963  0.87374616]

mean value: 0.8640652894973755

key: score_time
value: [0.21686745 0.18948936 0.22953916 0.2316184  0.22374058 0.22881055
 0.22816801 0.13683963 0.22541785 0.20880628]

mean value: 0.21192972660064696

key: test_mcc
value: [ 0.77459667  0.77459667  0.77459667  0.77459667 -0.25819889  0.77459667
  0.5         0.25819889 -0.16666667  0.73029674]

mean value: 0.49366134228809716

key: train_mcc
value: [0.97182532 0.97182532 0.97182532 1.         0.97182532 1.
 0.97182532 1.         0.97222222 0.97220047]

mean value: 0.9803549266788427

key: test_accuracy
value: [0.875      0.875      0.875      0.875      0.375      0.875
 0.75       0.625      0.42857143 0.85714286]

mean value: 0.7410714285714286

key: train_accuracy
value: [0.98571429 0.98571429 0.98571429 1.         0.98571429 1.
 0.98571429 1.         0.98591549 0.98591549]

mean value: 0.9900402414486922

key: test_fscore
value: [0.88888889 0.88888889 0.85714286 0.88888889 0.28571429 0.85714286
 0.75       0.57142857 0.33333333 0.88888889]

mean value: 0.721031746031746

key: train_fscore
value: [0.98550725 0.98550725 0.98550725 1.         0.98550725 1.
 0.98550725 1.         0.98591549 0.98550725]

mean value: 0.9898958971218615

key: test_precision
value: [0.8        0.8        1.         0.8        0.33333333 1.
 0.75       0.66666667 0.33333333 0.8       ]

mean value: 0.7283333333333334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.75       1.         0.25       0.75
 0.75       0.5        0.33333333 1.        ]

mean value: 0.7333333333333333

key: train_recall
value: [0.97142857 0.97142857 0.97142857 1.         0.97142857 1.
 0.97142857 1.         0.97222222 0.97142857]

mean value: 0.9800793650793651

key: test_roc_auc
value: [0.875      0.875      0.875      0.875      0.375      0.875
 0.75       0.625      0.41666667 0.83333333]

mean value: 0.7375

key: train_roc_auc
value: [0.98571429 0.98571429 0.98571429 1.         0.98571429 1.
 0.98571429 1.         0.98611111 0.98571429]

mean value: 0.9900396825396826

key: test_jcc
value: [0.8        0.8        0.75       0.8        0.16666667 0.75
 0.6        0.4        0.2        0.8       ]

mean value: 0.6066666666666667

key: train_jcc
value: [0.97142857 0.97142857 0.97142857 1.         0.97142857 1.
 0.97142857 1.         0.97222222 0.97142857]

mean value: 0.9800793650793651

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02411747 0.00868654 0.00863671 0.00892663 0.00904179 0.00923443
 0.00946021 0.00934601 0.00884771 0.0092268 ]

mean value: 0.010552430152893066

key: score_time
value: [0.01154542 0.00867844 0.0087924  0.00879693 0.00910354 0.00886774
 0.00883126 0.00899911 0.00942969 0.00934529]

mean value: 0.009238982200622558

key: test_mcc
value: [ 0.          0.          0.          0.37796447  0.25819889  0.5
  0.          0.         -0.16666667  0.75      ]

mean value: 0.17194966960897218

key: train_mcc
value: [0.6350853  0.69282032 0.6614769  0.71545476 0.69282032 0.78301997
 0.63089327 0.69282032 0.67079854 0.60683101]

mean value: 0.6782020711253509

key: test_accuracy
value: [0.5        0.5        0.5        0.625      0.625      0.75
 0.5        0.5        0.42857143 0.85714286]

mean value: 0.5785714285714285

key: train_accuracy
value: [0.81428571 0.84285714 0.82857143 0.85714286 0.84285714 0.88571429
 0.81428571 0.84285714 0.83098592 0.8028169 ]

mean value: 0.8362374245472837

key: test_fscore
value: [0.6        0.6        0.33333333 0.72727273 0.66666667 0.75
 0.6        0.5        0.33333333 0.85714286]

mean value: 0.5967748917748917

key: train_fscore
value: [0.82666667 0.85333333 0.83783784 0.86111111 0.85333333 0.89473684
 0.82191781 0.85333333 0.84615385 0.80555556]

mean value: 0.8453979667649458

key: test_precision
value: [0.5        0.5        0.5        0.57142857 0.6        0.75
 0.5        0.5        0.33333333 1.        ]

mean value: 0.5754761904761905

key: train_precision
value: [0.775      0.8        0.79487179 0.83783784 0.8        0.82926829
 0.78947368 0.8        0.78571429 0.78378378]

mean value: 0.7995949679101155

key: test_recall
value: [0.75       0.75       0.25       1.         0.75       0.75
 0.75       0.5        0.33333333 0.75      ]

mean value: 0.6583333333333333

key: train_recall
value: [0.88571429 0.91428571 0.88571429 0.88571429 0.91428571 0.97142857
 0.85714286 0.91428571 0.91666667 0.82857143]

mean value: 0.8973809523809524

key: test_roc_auc
value: [0.5        0.5        0.5        0.625      0.625      0.75
 0.5        0.5        0.41666667 0.875     ]

mean value: 0.5791666666666666

key: train_roc_auc
value: [0.81428571 0.84285714 0.82857143 0.85714286 0.84285714 0.88571429
 0.81428571 0.84285714 0.8297619  0.8031746 ]

mean value: 0.8361507936507937

key: test_jcc
value: [0.42857143 0.42857143 0.2        0.57142857 0.5        0.6
 0.42857143 0.33333333 0.2        0.75      ]

mean value: 0.444047619047619

key: train_jcc
value: [0.70454545 0.74418605 0.72093023 0.75609756 0.74418605 0.80952381
 0.69767442 0.74418605 0.73333333 0.6744186 ]

mean value: 0.7329081553727044

MCC on Blind test: 0.09

Accuracy on Blind test: 0.5

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.04690886 0.03529429 0.03520536 0.03485751 0.05215406 0.03321004
 0.03399205 0.2213099  0.03269053 0.03631282]

mean value: 0.05619354248046875

key: score_time
value: [0.01023316 0.0105567  0.01112413 0.0102911  0.01223016 0.01042199
 0.01048231 0.01189733 0.01079249 0.01114678]

mean value: 0.01091761589050293

key: test_mcc
value: [0.77459667 1.         0.77459667 1.         0.77459667 1.
 1.         1.         1.         1.        ]

mean value: 0.9323790007724451

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875 1.    0.875 1.    0.875 1.    1.    1.    1.    1.   ]

mean value: 0.9625

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 1.         0.85714286 1.         0.85714286 1.
 1.         1.         1.         1.        ]

mean value: 0.9603174603174603

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8 1.  1.  1.  1.  1.  1.  1.  1.  1. ]

mean value: 0.98

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   1.   0.75 1.   0.75 1.   1.   1.   1.   1.  ]

mean value: 0.95

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875 1.    0.875 1.    0.875 1.    1.    1.    1.    1.   ]

mean value: 0.9625

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8  1.   0.75 1.   0.75 1.   1.   1.   1.   1.  ]

mean value: 0.93

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.03269625 0.03927255 0.04035974 0.03941774 0.0393281  0.03936481
 0.03942204 0.03904486 0.03839588 0.04210567]

mean value: 0.03894076347351074

key: score_time
value: [0.02378345 0.01394677 0.02182436 0.02240849 0.02304435 0.02370811
 0.01481962 0.01573515 0.02450657 0.02406669]

mean value: 0.0207843542098999

key: test_mcc
value: [0.5        0.5        0.37796447 1.         0.77459667 0.57735027
 0.5        0.25819889 0.         0.73029674]

mean value: 0.5218407044527719

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.75       0.625      1.         0.875      0.75
 0.75       0.625      0.57142857 0.85714286]

mean value: 0.7553571428571428

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.75       0.4        1.         0.88888889 0.8
 0.75       0.57142857 0.         0.88888889]

mean value: 0.6799206349206349

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.75       1.         1.         0.8        0.66666667
 0.75       0.66666667 0.         0.8       ]

mean value: 0.7183333333333334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.75 0.25 1.   1.   1.   0.75 0.5  0.   1.  ]

mean value: 0.7

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.75       0.625      1.         0.875      0.75
 0.75       0.625      0.5        0.83333333]

mean value: 0.7458333333333333

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.6        0.25       1.         0.8        0.66666667
 0.6        0.4        0.         0.8       ]

mean value: 0.5716666666666667

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01487851 0.00880241 0.00839996 0.00833797 0.00825596 0.00826812
 0.00854874 0.00864625 0.00920296 0.00889826]

mean value: 0.00922391414642334

key: score_time
value: [0.0090704  0.0087707  0.00860238 0.00835609 0.00839686 0.00832748
 0.00927544 0.00877452 0.00908804 0.0086596 ]

mean value: 0.008732151985168458

key: test_mcc
value: [ 0.5         0.5         0.5         0.          0.          0.57735027
  0.         -0.25819889 -0.54772256  0.73029674]

mean value: 0.200172556527752

key: train_mcc
value: [0.57166195 0.51449576 0.54374562 0.51961524 0.45732956 0.5161854
 0.38152873 0.45883147 0.54920635 0.54972312]

mean value: 0.5062323194353353

key: test_accuracy
value: [0.75       0.75       0.75       0.5        0.5        0.75
 0.5        0.375      0.28571429 0.85714286]

mean value: 0.6017857142857143

key: train_accuracy
value: [0.78571429 0.75714286 0.77142857 0.75714286 0.72857143 0.75714286
 0.68571429 0.72857143 0.77464789 0.77464789]

mean value: 0.7520724346076458

key: test_fscore
value: [0.75       0.75       0.75       0.5        0.5        0.66666667
 0.5        0.28571429 0.         0.88888889]

mean value: 0.5591269841269841

key: train_fscore
value: [0.78873239 0.76056338 0.76470588 0.73846154 0.72463768 0.74626866
 0.71794872 0.73972603 0.77777778 0.76470588]

mean value: 0.7523527938814902

key: test_precision
value: [0.75       0.75       0.75       0.5        0.5        1.
 0.5        0.33333333 0.         0.8       ]

mean value: 0.5883333333333334

key: train_precision
value: [0.77777778 0.75       0.78787879 0.8        0.73529412 0.78125
 0.65116279 0.71052632 0.77777778 0.78787879]

mean value: 0.7559546355447339

key: test_recall
value: [0.75 0.75 0.75 0.5  0.5  0.5  0.5  0.25 0.   1.  ]

mean value: 0.55

key: train_recall
value: [0.8        0.77142857 0.74285714 0.68571429 0.71428571 0.71428571
 0.8        0.77142857 0.77777778 0.74285714]

mean value: 0.7520634920634921

key: test_roc_auc
value: [0.75       0.75       0.75       0.5        0.5        0.75
 0.5        0.375      0.25       0.83333333]

mean value: 0.5958333333333333

key: train_roc_auc
value: [0.78571429 0.75714286 0.77142857 0.75714286 0.72857143 0.75714286
 0.68571429 0.72857143 0.77460317 0.77420635]

mean value: 0.7520238095238095

key: test_jcc
value: [0.6        0.6        0.6        0.33333333 0.33333333 0.5
 0.33333333 0.16666667 0.         0.8       ]

mean value: 0.42666666666666664

key: train_jcc
value: [0.65116279 0.61363636 0.61904762 0.58536585 0.56818182 0.5952381
 0.56       0.58695652 0.63636364 0.61904762]

mean value: 0.6035000317610493

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0105958  0.01289964 0.0131743  0.0132277  0.01292658 0.01252127
 0.01212096 0.01317453 0.01325178 0.01340818]

mean value: 0.012730073928833009

key: score_time
value: [0.00837302 0.01156449 0.01140594 0.01190567 0.01152253 0.01143527
 0.01140642 0.01145077 0.01135659 0.01158524]

mean value: 0.011200594902038574

key: test_mcc
value: [0.77459667 0.77459667 0.         0.57735027 0.25819889 0.25819889
 0.77459667 0.5        0.09128709 0.73029674]

mean value: 0.4739121892666147

key: train_mcc
value: [0.97182532 0.97182532 0.94285714 0.91766294 0.91766294 0.91766294
 0.94440028 0.97182532 0.97222222 0.94365079]

mean value: 0.9471595194202586

key: test_accuracy
value: [0.875      0.875      0.5        0.75       0.625      0.625
 0.875      0.75       0.57142857 0.85714286]

mean value: 0.7303571428571428

key: train_accuracy
value: [0.98571429 0.98571429 0.97142857 0.95714286 0.95714286 0.95714286
 0.97142857 0.98571429 0.98591549 0.97183099]

mean value: 0.9729175050301812

key: test_fscore
value: [0.88888889 0.85714286 0.33333333 0.8        0.57142857 0.66666667
 0.88888889 0.75       0.4        0.88888889]

mean value: 0.7045238095238096

key: train_fscore
value: [0.98591549 0.98550725 0.97142857 0.95890411 0.95522388 0.95890411
 0.97058824 0.98550725 0.98591549 0.97142857]

mean value: 0.9729322956595473

key: test_precision
value: [0.8        1.         0.5        0.66666667 0.66666667 0.6
 0.8        0.75       0.5        0.8       ]

mean value: 0.7083333333333334

key: train_precision
value: [0.97222222 1.         0.97142857 0.92105263 1.         0.92105263
 1.         1.         1.         0.97142857]

mean value: 0.975718462823726

key: test_recall
value: [1.         0.75       0.25       1.         0.5        0.75
 1.         0.75       0.33333333 1.        ]

mean value: 0.7333333333333333

key: train_recall
value: [1.         0.97142857 0.97142857 1.         0.91428571 1.
 0.94285714 0.97142857 0.97222222 0.97142857]

mean value: 0.9715079365079365

key: test_roc_auc
value: [0.875      0.875      0.5        0.75       0.625      0.625
 0.875      0.75       0.54166667 0.83333333]

mean value: 0.725

key: train_roc_auc
value: [0.98571429 0.98571429 0.97142857 0.95714286 0.95714286 0.95714286
 0.97142857 0.98571429 0.98611111 0.9718254 ]

mean value: 0.972936507936508

key: test_jcc
value: [0.8        0.75       0.2        0.66666667 0.4        0.5
 0.8        0.6        0.25       0.8       ]

mean value: 0.5766666666666667

key: train_jcc
value: [0.97222222 0.97142857 0.94444444 0.92105263 0.91428571 0.92105263
 0.94285714 0.97142857 0.97222222 0.94444444]

mean value: 0.9475438596491228

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01302481 0.01203656 0.01209736 0.01269269 0.01250601 0.01253843
 0.01243162 0.01224613 0.01304078 0.01207447]

mean value: 0.012468886375427247

key: score_time
value: [0.01084328 0.01141644 0.0114181  0.01162481 0.01151371 0.01179433
 0.01152134 0.01150966 0.01145768 0.01138949]

mean value: 0.011448884010314941

key: test_mcc
value: [1.         0.25819889 0.57735027 0.57735027 0.25819889 0.
 0.77459667 0.25819889 1.         0.73029674]

mean value: 0.5434190620202439

key: train_mcc
value: [1.         0.79240582 0.47756693 0.81649658 1.         0.74535599
 0.8660254  0.8340361  1.         0.89282857]

mean value: 0.8424715394204912

key: test_accuracy
value: [1.         0.625      0.75       0.75       0.625      0.5
 0.875      0.625      1.         0.85714286]

mean value: 0.7607142857142857

key: train_accuracy
value: [1.         0.88571429 0.68571429 0.9        1.         0.85714286
 0.92857143 0.91428571 1.         0.94366197]

mean value: 0.9115090543259557

key: test_fscore
value: [1.         0.66666667 0.8        0.8        0.66666667 0.6
 0.88888889 0.57142857 1.         0.88888889]

mean value: 0.7882539682539682

key: train_fscore
value: [1.         0.8974359  0.76086957 0.90909091 1.         0.875
 0.93333333 0.90909091 1.         0.93939394]

mean value: 0.922421455356238

key: test_precision
value: [1.         0.6        0.66666667 0.66666667 0.6        0.5
 0.8        0.66666667 1.         0.8       ]

mean value: 0.73

key: train_precision
value: [1.         0.81395349 0.61403509 0.83333333 1.         0.77777778
 0.875      0.96774194 1.         1.        ]

mean value: 0.8881841622686374

key: test_recall
value: [1.   0.75 1.   1.   0.75 0.75 1.   0.5  1.   1.  ]

mean value: 0.875

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         0.85714286 1.         0.88571429]

mean value: 0.9742857142857143

key: test_roc_auc
value: [1.         0.625      0.75       0.75       0.625      0.5
 0.875      0.625      1.         0.83333333]

mean value: 0.7583333333333333

key: train_roc_auc
value: [1.         0.88571429 0.68571429 0.9        1.         0.85714286
 0.92857143 0.91428571 1.         0.94285714]

mean value: 0.9114285714285715

key: test_jcc
value: [1.         0.5        0.66666667 0.66666667 0.5        0.42857143
 0.8        0.4        1.         0.8       ]

mean value: 0.6761904761904762

key: train_jcc
value: [1.         0.81395349 0.61403509 0.83333333 1.         0.77777778
 0.875      0.83333333 1.         0.88571429]

mean value: 0.8633147306250122

MCC on Blind test: 0.61

Accuracy on Blind test: 0.8

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.09036541 0.08348179 0.08240795 0.08388019 0.08449244 0.07931352
 0.08360267 0.08131385 0.08257937 0.08250093]

mean value: 0.08339381217956543

key: score_time
value: [0.01578093 0.01575756 0.01484179 0.01599073 0.01581931 0.01575685
 0.01468801 0.01461697 0.01501131 0.0157845 ]

mean value: 0.015404796600341797

key: test_mcc
value: [0.77459667 0.77459667 0.77459667 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9323790007724451

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875 0.875 0.875 1.    1.    1.    1.    1.    1.    1.   ]

mean value: 0.9625

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 0.88888889 0.85714286 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9634920634920635

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8 0.8 1.  1.  1.  1.  1.  1.  1.  1. ]

mean value: 0.96

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   1.   0.75 1.   1.   1.   1.   1.   1.   1.  ]

mean value: 0.975

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875 0.875 0.875 1.    1.    1.    1.    1.    1.    1.   ]

mean value: 0.9625

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8  0.8  0.75 1.   1.   1.   1.   1.   1.   1.  ]

mean value: 0.935

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.82

Accuracy on Blind test: 0.9

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.0357008  0.02489686 0.04498053 0.03115368 0.03102827 0.04373121
 0.04515457 0.0286448  0.04884458 0.0265274 ]

mean value: 0.03606626987457275

key: score_time
value: [0.01645994 0.01881194 0.02027988 0.02257514 0.02798772 0.0293231
 0.02213931 0.02669263 0.02155805 0.02561426]

mean value: 0.02314419746398926

key: test_mcc
value: [0.77459667 1.         0.77459667 1.         0.77459667 1.
 1.         0.77459667 1.         1.        ]

mean value: 0.9098386676965934

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875 1.    0.875 1.    0.875 1.    1.    0.875 1.    1.   ]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 1.         0.85714286 1.         0.85714286 1.
 1.         0.85714286 1.         1.        ]

mean value: 0.946031746031746

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8 1.  1.  1.  1.  1.  1.  1.  1.  1. ]

mean value: 0.98

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   1.   0.75 1.   0.75 1.   1.   0.75 1.   1.  ]

mean value: 0.925

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875 1.    0.875 1.    0.875 1.    1.    0.875 1.    1.   ]

mean value: 0.95

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8  1.   0.75 1.   0.75 1.   1.   0.75 1.   1.  ]

mean value: 0.905

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.01824498 0.01467395 0.01475954 0.01525283 0.01523209 0.01748681
 0.01516032 0.01657128 0.01515055 0.01515698]

mean value: 0.015768933296203613

key: score_time
value: [0.01170707 0.01118374 0.01194334 0.01181722 0.01187658 0.01180911
 0.01189971 0.01253724 0.01190686 0.01177239]

mean value: 0.01184532642364502

key: test_mcc
value: [ 0.          0.          0.57735027  0.57735027  0.25819889  0.
  0.25819889  0.57735027 -0.54772256 -0.09128709]

mean value: 0.1609438936640506

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.5        0.5        0.75       0.75       0.625      0.5
 0.625      0.75       0.28571429 0.42857143]

mean value: 0.5714285714285714

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.6        0.5        0.66666667 0.8        0.66666667 0.5
 0.57142857 0.66666667 0.         0.33333333]

mean value: 0.5304761904761904

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.5        1.         0.66666667 0.6        0.5
 0.66666667 1.         0.         0.5       ]

mean value: 0.5933333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75 0.5  0.5  1.   0.75 0.5  0.5  0.5  0.   0.25]

mean value: 0.525

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.5        0.5        0.75       0.75       0.625      0.5
 0.625      0.75       0.25       0.45833333]

mean value: 0.5708333333333333

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.42857143 0.33333333 0.5        0.66666667 0.5        0.33333333
 0.4        0.5        0.         0.2       ]

mean value: 0.3861904761904762

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.14340568 0.11175632 0.1094327  0.10973072 0.11174297 0.11086988
 0.12684155 0.10878468 0.10972738 0.10989308]

mean value: 0.11521849632263184

key: score_time
value: [0.00917768 0.00896311 0.00931168 0.00914812 0.0098567  0.00918627
 0.00914383 0.00896716 0.00933671 0.00893211]

mean value: 0.009202337265014649

key: test_mcc
value: [0.77459667 1.         0.77459667 1.         0.77459667 1.
 0.77459667 1.         0.75       1.        ]

mean value: 0.8848386676965934

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.875      1.         0.875      1.         0.875      1.
 0.875      1.         0.85714286 1.        ]

mean value: 0.9357142857142857

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88888889 1.         0.85714286 1.         0.85714286 1.
 0.85714286 1.         0.85714286 1.        ]

mean value: 0.9317460317460318

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.8  1.   1.   1.   1.   1.   1.   1.   0.75 1.  ]

mean value: 0.955

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   1.   0.75 1.   0.75 1.   0.75 1.   1.   1.  ]

mean value: 0.925

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875 1.    0.875 1.    0.875 1.    0.875 1.    0.875 1.   ]

mean value: 0.9375

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8  1.   0.75 1.   0.75 1.   0.75 1.   0.75 1.  ]

mean value: 0.88

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[0.00942969 0.01349497 0.01381683 0.01315737 0.01314712 0.01347995
 0.01326299 0.01326776 0.01324487 0.01427436]

mean value: 0.013057589530944824

key: score_time
value: [0.00895739 0.01176214 0.01184082 0.0119319  0.01179528 0.01169872
 0.01167274 0.01186228 0.01510572 0.01546717]

mean value: 0.012209415435791016

key: test_mcc
value: [ 0.          0.          0.25819889 -0.57735027  0.          0.25819889
  0.          0.         -0.41666667 -0.41666667]

mean value: -0.0894285823028637

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.5        0.5        0.625      0.25       0.5        0.625
 0.5        0.5        0.28571429 0.28571429]

mean value: 0.45714285714285713

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.33333333 0.57142857 0.4        0.5        0.66666667
 0.5        0.33333333 0.28571429 0.28571429]

mean value: 0.43761904761904763

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.5        0.66666667 0.33333333 0.5        0.6
 0.5        0.5        0.25       0.33333333]

mean value: 0.4683333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.5        0.25       0.5        0.5        0.5        0.75
 0.5        0.25       0.33333333 0.25      ]

mean value: 0.43333333333333335

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.5        0.5        0.625      0.25       0.5        0.625
 0.5        0.5        0.29166667 0.29166667]

mean value: 0.4583333333333333

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.2        0.4        0.25       0.33333333 0.5
 0.33333333 0.2        0.16666667 0.16666667]

mean value: 0.28833333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.5

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02254224 0.01239038 0.01327562 0.02693105 0.01684093 0.01240849
 0.01239514 0.03621268 0.03258634 0.03197312]

mean value: 0.02175559997558594

key: score_time
value: [0.01179028 0.01146269 0.01154327 0.01174164 0.01170254 0.01150298
 0.01150703 0.0206039  0.02236962 0.01990366]

mean value: 0.014412760734558105

key: test_mcc
value: [0.77459667 0.5        0.25819889 1.         0.25819889 0.5
 0.5        0.5        0.75       0.73029674]

mean value: 0.5771291192076027

key: train_mcc
value: [1.         0.97182532 0.97182532 0.97182532 1.         0.97182532
 1.         0.97182532 1.         0.97220047]

mean value: 0.9831327044566206

key: test_accuracy
value: [0.875      0.75       0.625      1.         0.625      0.75
 0.75       0.75       0.85714286 0.85714286]

mean value: 0.7839285714285714

key: train_accuracy
value: [1.         0.98571429 0.98571429 0.98571429 1.         0.98571429
 1.         0.98571429 1.         0.98591549]

mean value: 0.9914486921529175

key: test_fscore
value: [0.88888889 0.75       0.57142857 1.         0.66666667 0.75
 0.75       0.75       0.85714286 0.88888889]

mean value: 0.7873015873015873

key: train_fscore
value: [1.         0.98550725 0.98550725 0.98550725 1.         0.98550725
 1.         0.98550725 1.         0.98550725]

mean value: 0.9913043478260869

key: test_precision
value: [0.8        0.75       0.66666667 1.         0.6        0.75
 0.75       0.75       0.75       0.8       ]

mean value: 0.7616666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.5  1.   0.75 0.75 0.75 0.75 1.   1.  ]

mean value: 0.825

key: train_recall
value: [1.         0.97142857 0.97142857 0.97142857 1.         0.97142857
 1.         0.97142857 1.         0.97142857]

mean value: 0.9828571428571429

key: test_roc_auc
value: [0.875      0.75       0.625      1.         0.625      0.75
 0.75       0.75       0.875      0.83333333]

mean value: 0.7833333333333333

key: train_roc_auc
value: [1.         0.98571429 0.98571429 0.98571429 1.         0.98571429
 1.         0.98571429 1.         0.98571429]

mean value: 0.9914285714285714

key: test_jcc
value: [0.8  0.6  0.4  1.   0.5  0.6  0.6  0.6  0.75 0.8 ]

mean value: 0.665

key: train_jcc
value: [1.         0.97142857 0.97142857 0.97142857 1.         0.97142857
 1.         0.97142857 1.         0.97142857]

mean value: 0.9828571428571429

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:168: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.11865163 0.11227584 0.10895681 0.11447453 0.1230824  0.10879159
 0.10806894 0.12800884 0.14051032 0.13082147]

mean value: 0.11936423778533936

key: score_time
value: [0.0119226  0.01184297 0.01192427 0.01190925 0.01179338 0.01192069
 0.01186323 0.01195812 0.01183772 0.01185822]

mean value: 0.011883044242858886

key: test_mcc
value: [0.57735027 0.5        0.25819889 0.57735027 0.25819889 0.5
 0.5        0.5        1.         1.        ]

mean value: 0.5671098317873574

key: train_mcc
value: [1.         0.97182532 0.97182532 1.         1.         0.97182532
 1.         1.         1.         1.        ]

mean value: 0.9915475947422651

key: test_accuracy
value: [0.75  0.75  0.625 0.75  0.625 0.75  0.75  0.75  1.    1.   ]

mean value: 0.775

key: train_accuracy
value: [1.         0.98571429 0.98571429 1.         1.         0.98571429
 1.         1.         1.         1.        ]

mean value: 0.9957142857142858

key: test_fscore
value: [0.8        0.75       0.57142857 0.66666667 0.66666667 0.75
 0.75       0.75       1.         1.        ]

mean value: 0.7704761904761904

key: train_fscore
value: [1.         0.98550725 0.98550725 1.         1.         0.98550725
 1.         1.         1.         1.        ]

mean value: 0.9956521739130435

key: test_precision
value: [0.66666667 0.75       0.66666667 1.         0.6        0.75
 0.75       0.75       1.         1.        ]

mean value: 0.7933333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.   0.75 0.5  0.5  0.75 0.75 0.75 0.75 1.   1.  ]

mean value: 0.775

key: train_recall
value: [1.         0.97142857 0.97142857 1.         1.         0.97142857
 1.         1.         1.         1.        ]

mean value: 0.9914285714285714

key: test_roc_auc
value: [0.75  0.75  0.625 0.75  0.625 0.75  0.75  0.75  1.    1.   ]

mean value: 0.775

key: train_roc_auc
value: [1.         0.98571429 0.98571429 1.         1.         0.98571429
 1.         1.         1.         1.        ]

mean value: 0.9957142857142858

key: test_jcc
value: [0.66666667 0.6        0.4        0.5        0.5        0.6
 0.6        0.6        1.         1.        ]

mean value: 0.6466666666666666

key: train_jcc
value: [1.         0.97142857 0.97142857 1.         1.         0.97142857
 1.         1.         1.         1.        ]

mean value: 0.9914285714285714

MCC on Blind test: 0.67

Accuracy on Blind test: 0.8

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03346944 0.02624464 0.0318985  0.04968739 0.02914023 0.0289185
 0.02685785 0.02582574 0.04446888 0.03540468]

mean value: 0.033191585540771486

key: score_time
value: [0.01189733 0.01162148 0.01215982 0.01187849 0.0119226  0.01188231
 0.01174784 0.01169324 0.01198626 0.01169777]

mean value: 0.011848711967468261

key: test_mcc
value: [0.42857143 0.57735027 0.63245553 0.8660254  0.57735027 0.74535599
 0.74535599 0.57735027 0.8660254  0.63245553]

mean value: 0.6648296092776395

key: train_mcc
value: [0.92075092 0.9047619  0.87345612 0.90521816 0.93650794 0.87345612
 0.88900089 0.88900089 0.93650794 0.95250095]

mean value: 0.9081161839191141

key: test_accuracy
value: [0.71428571 0.78571429 0.78571429 0.92857143 0.78571429 0.85714286
 0.85714286 0.78571429 0.92857143 0.78571429]

mean value: 0.8214285714285714

key: train_accuracy
value: [0.96031746 0.95238095 0.93650794 0.95238095 0.96825397 0.93650794
 0.94444444 0.94444444 0.96825397 0.97619048]

mean value: 0.9539682539682539

key: test_fscore
value: [0.71428571 0.76923077 0.82352941 0.92307692 0.8        0.875
 0.875      0.8        0.92307692 0.82352941]

mean value: 0.8326729153199741

key: train_fscore
value: [0.96       0.95238095 0.9375     0.9516129  0.96825397 0.9375
 0.94488189 0.94488189 0.96825397 0.97637795]

mean value: 0.954164352439816

key: test_precision
value: [0.71428571 0.83333333 0.7        1.         0.75       0.77777778
 0.77777778 0.75       1.         0.7       ]

mean value: 0.8003174603174603

key: train_precision
value: [0.96774194 0.95238095 0.92307692 0.96721311 0.96825397 0.92307692
 0.9375     0.9375     0.96825397 0.96875   ]

mean value: 0.9513747785280704

key: test_recall
value: [0.71428571 0.71428571 1.         0.85714286 0.85714286 1.
 1.         0.85714286 0.85714286 1.        ]

mean value: 0.8857142857142857

key: train_recall
value: [0.95238095 0.95238095 0.95238095 0.93650794 0.96825397 0.95238095
 0.95238095 0.95238095 0.96825397 0.98412698]

mean value: 0.9571428571428571

key: test_roc_auc
value: [0.71428571 0.78571429 0.78571429 0.92857143 0.78571429 0.85714286
 0.85714286 0.78571429 0.92857143 0.78571429]

mean value: 0.8214285714285715

key: train_roc_auc
value: [0.96031746 0.95238095 0.93650794 0.95238095 0.96825397 0.93650794
 0.94444444 0.94444444 0.96825397 0.97619048]

mean value: 0.9539682539682539

key: test_jcc
value: [0.55555556 0.625      0.7        0.85714286 0.66666667 0.77777778
 0.77777778 0.66666667 0.85714286 0.7       ]

mean value: 0.7183730158730158

key: train_jcc
value: [0.92307692 0.90909091 0.88235294 0.90769231 0.93846154 0.88235294
 0.89552239 0.89552239 0.93846154 0.95384615]

mean value: 0.9126380029101715

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.70358634 0.80011916 0.68121314 0.6453855  0.65406132 0.78625679
 0.69044399 0.66302991 0.7690618  0.63810492]

mean value: 0.7031262874603271

key: score_time
value: [0.01948142 0.01645517 0.0154233  0.01446366 0.0120368  0.0143435
 0.0146184  0.01605535 0.01506495 0.0145781 ]

mean value: 0.015252065658569337

key: test_mcc
value: [0.1490712  0.57735027 0.8660254  1.         0.63245553 0.8660254
 0.8660254  0.74535599 1.         0.52223297]

mean value: 0.7224542171443628

key: train_mcc
value: [1.         1.         1.         0.98425098 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9984250984251476

key: test_accuracy
value: [0.57142857 0.78571429 0.92857143 1.         0.78571429 0.92857143
 0.92857143 0.85714286 1.         0.71428571]

mean value: 0.85

key: train_accuracy
value: [1.         1.         1.         0.99206349 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9992063492063492

key: test_fscore
value: [0.625      0.8        0.93333333 1.         0.82352941 0.93333333
 0.93333333 0.875      1.         0.77777778]

mean value: 0.8701307189542484

key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[1.         1.         1.         0.99212598 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9992125984251968

key: test_precision
value: [0.55555556 0.75       0.875      1.         0.7        0.875
 0.875      0.77777778 1.         0.63636364]

mean value: 0.804469696969697

key: train_precision
value: [1.       1.       1.       0.984375 1.       1.       1.       1.
 1.       1.      ]

mean value: 0.9984375

key: test_recall
value: [0.71428571 0.85714286 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9571428571428572

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.57142857 0.78571429 0.92857143 1.         0.78571429 0.92857143
 0.92857143 0.85714286 1.         0.71428571]

mean value: 0.8500000000000001

key: train_roc_auc
value: [1.         1.         1.         0.99206349 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9992063492063492

key: test_jcc
value: [0.45454545 0.66666667 0.875      1.         0.7        0.875
 0.875      0.77777778 1.         0.63636364]

mean value: 0.7860353535353535

key: train_jcc
value: [1.       1.       1.       0.984375 1.       1.       1.       1.
 1.       1.      ]

mean value: 0.9984375

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01240516 0.01079416 0.00873613 0.00856209 0.00837946 0.00852203
 0.00998378 0.00867295 0.00848079 0.00854564]

mean value: 0.009308218955993652

key: score_time
value: [0.01174188 0.00898147 0.0090487  0.00852609 0.00863147 0.00845432
 0.00860214 0.00853848 0.0084095  0.00898838]

mean value: 0.008992242813110351

key: test_mcc
value: [ 0.17407766 -0.17407766  0.1490712   0.14285714  0.57735027  0.
  0.28867513  0.          0.1490712   0.31622777]

mean value: 0.16232527096583912

key: train_mcc
value: [0.57498891 0.4512753  0.40192095 0.38960673 0.49838198 0.46890221
 0.59209474 0.42943789 0.44494921 0.42177569]

mean value: 0.46733336027843136

key: test_accuracy
value: [0.57142857 0.42857143 0.57142857 0.57142857 0.78571429 0.5
 0.64285714 0.5        0.57142857 0.64285714]

mean value: 0.5785714285714285

key: train_accuracy
value: [0.78571429 0.6984127  0.6984127  0.68253968 0.74603175 0.73015873
 0.79365079 0.71428571 0.72222222 0.70634921]

mean value: 0.7277777777777777

key: test_fscore
value: [0.66666667 0.55555556 0.625      0.57142857 0.8        0.63157895
 0.61538462 0.46153846 0.5        0.70588235]

mean value: 0.6133035170883467

key: train_fscore
value: [0.79699248 0.75641026 0.72058824 0.72972973 0.76470588 0.75362319
 0.77966102 0.72307692 0.71544715 0.73381295]

mean value: 0.7474047817533758

key: test_precision
value: [0.54545455 0.45454545 0.55555556 0.57142857 0.75       0.5
 0.66666667 0.5        0.6        0.6       ]

mean value: 0.5743650793650793

key: train_precision
value: [0.75714286 0.6344086  0.67123288 0.63529412 0.71232877 0.69333333
 0.83636364 0.70149254 0.73333333 0.67105263]

mean value: 0.7045982692698753

key: test_recall
value: [0.85714286 0.71428571 0.71428571 0.57142857 0.85714286 0.85714286
 0.57142857 0.42857143 0.42857143 0.85714286]

mean value: 0.6857142857142857

key: train_recall
value: [0.84126984 0.93650794 0.77777778 0.85714286 0.82539683 0.82539683
 0.73015873 0.74603175 0.6984127  0.80952381]

mean value: 0.8047619047619048

key: test_roc_auc
value: [0.57142857 0.42857143 0.57142857 0.57142857 0.78571429 0.5
 0.64285714 0.5        0.57142857 0.64285714]

mean value: 0.5785714285714286

key: train_roc_auc
value: [0.78571429 0.6984127  0.6984127  0.68253968 0.74603175 0.73015873
 0.79365079 0.71428571 0.72222222 0.70634921]

mean value: 0.7277777777777779

key: test_jcc
value: [0.5        0.38461538 0.45454545 0.4        0.66666667 0.46153846
 0.44444444 0.3        0.33333333 0.54545455]

mean value: 0.44905982905982905

key: train_jcc
value: [0.6625     0.60824742 0.56321839 0.57446809 0.61904762 0.60465116
 0.63888889 0.56626506 0.55696203 0.57954545]

mean value: 0.5973794109421473

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00943851 0.00883722 0.00869012 0.00863743 0.0090723  0.00896025
 0.00871539 0.00867486 0.00866628 0.00882387]

mean value: 0.00885162353515625

key: score_time
value: [0.00867724 0.00851727 0.00857329 0.00897312 0.00852656 0.00869584
 0.00854874 0.00855994 0.00854278 0.00855541]

mean value: 0.008617019653320313

key: test_mcc
value: [ 0.28867513 -0.1490712   0.4472136   0.42857143  0.          0.31622777
  0.28867513  0.28867513  0.71428571  0.57735027]

mean value: 0.3200602978848017

key: train_mcc
value: [0.66877624 0.71572981 0.68811011 0.67357531 0.71464592 0.68811011
 0.63564173 0.78412547 0.70276422 0.70276422]

mean value: 0.6974243139367098

key: test_accuracy
value: [0.64285714 0.42857143 0.71428571 0.71428571 0.5        0.64285714
 0.64285714 0.64285714 0.85714286 0.78571429]

mean value: 0.6571428571428571

key: train_accuracy
value: [0.83333333 0.85714286 0.84126984 0.83333333 0.85714286 0.84126984
 0.81746032 0.88888889 0.84920635 0.84920635]

mean value: 0.8468253968253968

key: test_fscore
value: [0.66666667 0.33333333 0.75       0.71428571 0.53333333 0.70588235
 0.66666667 0.61538462 0.85714286 0.8       ]

mean value: 0.6642695539754363

key: train_fscore
value: [0.83969466 0.86153846 0.85074627 0.84444444 0.859375   0.85074627
 0.82170543 0.89552239 0.85714286 0.85714286]

mean value: 0.8538058628486893

key: test_precision
value: [0.625      0.4        0.66666667 0.71428571 0.5        0.6
 0.625      0.66666667 0.85714286 0.75      ]

mean value: 0.6404761904761904

key: train_precision
value: [0.80882353 0.8358209  0.8028169  0.79166667 0.84615385 0.8028169
 0.8030303  0.84507042 0.81428571 0.81428571]

mean value: 0.816477089470851

key: test_recall
value: [0.71428571 0.28571429 0.85714286 0.71428571 0.57142857 0.85714286
 0.71428571 0.57142857 0.85714286 0.85714286]

mean value: 0.7

key: train_recall
value: [0.87301587 0.88888889 0.9047619  0.9047619  0.87301587 0.9047619
 0.84126984 0.95238095 0.9047619  0.9047619 ]

mean value: 0.8952380952380953

key: test_roc_auc
value: [0.64285714 0.42857143 0.71428571 0.71428571 0.5        0.64285714
 0.64285714 0.64285714 0.85714286 0.78571429]

mean value: 0.6571428571428571

key: train_roc_auc
value: [0.83333333 0.85714286 0.84126984 0.83333333 0.85714286 0.84126984
 0.81746032 0.88888889 0.84920635 0.84920635]

mean value: 0.8468253968253968

key: test_jcc
value: [0.5        0.2        0.6        0.55555556 0.36363636 0.54545455
 0.5        0.44444444 0.75       0.66666667]

mean value: 0.5125757575757576

key: train_jcc
value: [0.72368421 0.75675676 0.74025974 0.73076923 0.75342466 0.74025974
 0.69736842 0.81081081 0.75       0.75      ]

mean value: 0.7453333567969473

MCC on Blind test: 0.09

Accuracy on Blind test: 0.5

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00975204 0.00937724 0.00941324 0.00952172 0.00948811 0.00852609
 0.00859451 0.00985909 0.00878239 0.00956988]

mean value: 0.009288430213928223

key: score_time
value: [0.0101881  0.01011419 0.01017499 0.01032948 0.01011992 0.00958657
 0.00983787 0.01024365 0.00955081 0.01083493]

mean value: 0.01009805202484131

key: test_mcc
value: [0.17407766 0.63245553 0.52223297 0.         0.31622777 0.4472136
 0.57735027 0.57735027 0.1490712  0.14285714]

mean value: 0.3538836397109643

key: train_mcc
value: [0.66742381 0.60721047 0.60693274 0.58759776 0.57323678 0.64482588
 0.53975054 0.62187434 0.65112184 0.63887656]

mean value: 0.6138850709215125

key: test_accuracy
value: [0.57142857 0.78571429 0.71428571 0.5        0.64285714 0.71428571
 0.78571429 0.78571429 0.57142857 0.57142857]

mean value: 0.6642857142857143

key: train_accuracy
value: [0.83333333 0.79365079 0.8015873  0.79365079 0.78571429 0.81746032
 0.76984127 0.80952381 0.82539683 0.81746032]

mean value: 0.8047619047619048

key: test_fscore
value: [0.66666667 0.72727273 0.77777778 0.46153846 0.70588235 0.75
 0.76923077 0.76923077 0.5        0.57142857]

mean value: 0.669902809608692

key: train_fscore
value: [0.8372093  0.81690141 0.81203008 0.79032258 0.79389313 0.83211679
 0.77165354 0.81818182 0.828125   0.82706767]

mean value: 0.8127501315363415

key: test_precision
value: [0.54545455 1.         0.63636364 0.5        0.6        0.66666667
 0.83333333 0.83333333 0.6        0.57142857]

mean value: 0.6786580086580086

key: train_precision
value: [0.81818182 0.73417722 0.77142857 0.80327869 0.76470588 0.77027027
 0.765625   0.7826087  0.81538462 0.78571429]

mean value: 0.781137504269914

key: test_recall
value: [0.85714286 0.57142857 1.         0.42857143 0.85714286 0.85714286
 0.71428571 0.71428571 0.42857143 0.57142857]

mean value: 0.7

key: train_recall
value: [0.85714286 0.92063492 0.85714286 0.77777778 0.82539683 0.9047619
 0.77777778 0.85714286 0.84126984 0.87301587]

mean value: 0.8492063492063492

key: test_roc_auc
value: [0.57142857 0.78571429 0.71428571 0.5        0.64285714 0.71428571
 0.78571429 0.78571429 0.57142857 0.57142857]

mean value: 0.6642857142857143

key: train_roc_auc
value: [0.83333333 0.79365079 0.8015873  0.79365079 0.78571429 0.81746032
 0.76984127 0.80952381 0.82539683 0.81746032]

mean value: 0.8047619047619048

key: test_jcc
value: [0.5        0.57142857 0.63636364 0.3        0.54545455 0.6
 0.625      0.625      0.33333333 0.4       ]

mean value: 0.5136580086580087

key: train_jcc
value: [0.72       0.69047619 0.6835443  0.65333333 0.65822785 0.7125
 0.62820513 0.69230769 0.70666667 0.70512821]

mean value: 0.685038936801595

MCC on Blind test: -0.17

Accuracy on Blind test: 0.4

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01123381 0.01120996 0.01115704 0.01081371 0.01120543 0.01074576
 0.01096678 0.01101112 0.0098691  0.00978637]

mean value: 0.010799908638000488

key: score_time
value: [0.01048398 0.00968146 0.00980759 0.00937915 0.00958252 0.00935674
 0.00955558 0.00967646 0.00892568 0.00904989]

mean value: 0.009549903869628906

key: test_mcc
value: [0.14285714 0.57735027 0.52223297 0.57735027 0.28867513 0.52223297
 0.8660254  0.28867513 1.         0.74535599]

mean value: 0.5530755282444576

key: train_mcc
value: [0.85725086 0.84511128 0.84297067 0.81116045 0.82633424 0.84297067
 0.84126984 0.82800868 0.88900089 0.84297067]

mean value: 0.8427048241806472

key: test_accuracy
value: [0.57142857 0.78571429 0.71428571 0.78571429 0.64285714 0.71428571
 0.92857143 0.64285714 1.         0.85714286]

mean value: 0.7642857142857142

key: train_accuracy
value: [0.92857143 0.92063492 0.92063492 0.9047619  0.91269841 0.92063492
 0.92063492 0.91269841 0.94444444 0.92063492]

mean value: 0.9206349206349206

key: test_fscore
value: [0.57142857 0.76923077 0.77777778 0.8        0.66666667 0.77777778
 0.93333333 0.61538462 1.         0.875     ]

mean value: 0.7786599511599511

key: train_fscore
value: [0.92913386 0.92424242 0.92307692 0.90769231 0.91472868 0.92307692
 0.92063492 0.91603053 0.94488189 0.92307692]

mean value: 0.9226575386353605

key: test_precision
value: [0.57142857 0.83333333 0.63636364 0.75       0.625      0.63636364
 0.875      0.66666667 1.         0.77777778]

mean value: 0.7371933621933622

key: train_precision
value: [0.921875   0.88405797 0.89552239 0.88059701 0.89393939 0.89552239
 0.92063492 0.88235294 0.9375     0.89552239]

mean value: 0.9007524405869756

key: test_recall
value: [0.57142857 0.71428571 1.         0.85714286 0.71428571 1.
 1.         0.57142857 1.         1.        ]

mean value: 0.8428571428571429

key: train_recall
value: [0.93650794 0.96825397 0.95238095 0.93650794 0.93650794 0.95238095
 0.92063492 0.95238095 0.95238095 0.95238095]

mean value: 0.946031746031746

key: test_roc_auc
value: [0.57142857 0.78571429 0.71428571 0.78571429 0.64285714 0.71428571
 0.92857143 0.64285714 1.         0.85714286]

mean value: 0.7642857142857143

key: train_roc_auc
value: [0.92857143 0.92063492 0.92063492 0.9047619  0.91269841 0.92063492
 0.92063492 0.91269841 0.94444444 0.92063492]

mean value: 0.9206349206349206

key: test_jcc
value: [0.4        0.625      0.63636364 0.66666667 0.5        0.63636364
 0.875      0.44444444 1.         0.77777778]

mean value: 0.6561616161616162

key: train_jcc
value: [0.86764706 0.85915493 0.85714286 0.83098592 0.84285714 0.85714286
 0.85294118 0.84507042 0.89552239 0.85714286]

mean value: 0.8565607605245167

MCC on Blind test: 0.25

Accuracy on Blind test: 0.6

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.49464512 0.49820232 0.5874176  0.53173232 0.50007153 0.56735539
 0.59726977 0.5393312  0.50509214 0.54775929]

mean value: 0.5368876695632935

key: score_time
value: [0.01225948 0.0286274  0.012254   0.01226807 0.01234031 0.01249647
 0.01239157 0.01230764 0.01273727 0.01596975]

mean value: 0.014365196228027344

key: test_mcc
value: [0.1490712  0.1490712  0.63245553 1.         0.74535599 0.8660254
 0.74535599 0.8660254  1.         0.4472136 ]

mean value: 0.6600574317102343

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.57142857 0.57142857 0.78571429 1.         0.85714286 0.92857143
 0.85714286 0.92857143 1.         0.71428571]

mean value: 0.8214285714285714

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.625      0.5        0.82352941 1.         0.875      0.93333333
 0.875      0.92307692 1.         0.75      ]

mean value: 0.8304939668174962

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.55555556 0.6        0.7        1.         0.77777778 0.875
 0.77777778 1.         1.         0.66666667]

mean value: 0.7952777777777778

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.42857143 1.         1.         1.         1.
 1.         0.85714286 1.         0.85714286]

mean value: 0.8857142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.57142857 0.57142857 0.78571429 1.         0.85714286 0.92857143
 0.85714286 0.92857143 1.         0.71428571]

mean value: 0.8214285714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.45454545 0.33333333 0.7        1.         0.77777778 0.875
 0.77777778 0.85714286 1.         0.6       ]

mean value: 0.73755772005772

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01468539 0.01246285 0.01255035 0.01075816 0.01069736 0.01113153
 0.01100087 0.01145768 0.01119852 0.01153445]

mean value: 0.01174771785736084

key: score_time
value: [0.01175427 0.00928426 0.00931358 0.00870562 0.00871897 0.00880766
 0.00885773 0.00903249 0.00884533 0.00887561]

mean value: 0.009219551086425781

key: test_mcc
value: [0.63245553 0.8660254  1.         0.8660254  0.8660254  0.63245553
 1.         0.8660254  1.         0.8660254 ]

mean value: 0.8595038082989546

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78571429 0.92857143 1.         0.92857143 0.92857143 0.78571429
 1.         0.92857143 1.         0.92857143]

mean value: 0.9214285714285715

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.92307692 1.         0.92307692 0.93333333 0.82352941
 1.         0.92307692 1.         0.93333333]

mean value: 0.9282956259426848

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7   1.    1.    1.    0.875 0.7   1.    1.    1.    0.875]

mean value: 0.915

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         0.85714286 1.         1.        ]

mean value: 0.9571428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78571429 0.92857143 1.         0.92857143 0.92857143 0.78571429
 1.         0.92857143 1.         0.92857143]

mean value: 0.9214285714285715

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.85714286 1.         0.85714286 0.875      0.7
 1.         0.85714286 1.         0.875     ]

mean value: 0.8721428571428571

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.67

Accuracy on Blind test: 0.8

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.0882473  0.08731794 0.08704042 0.08775353 0.08557534 0.08603501
 0.08768535 0.08798957 0.08857989 0.08624673]

mean value: 0.08724710941314698

key: score_time
value: [0.01715994 0.01705217 0.01723671 0.01793122 0.01705194 0.01739645
 0.01781774 0.01779103 0.0172832  0.01782751]

mean value: 0.01745479106903076

key: test_mcc
value: [0.42857143 0.71428571 0.8660254  1.         0.4472136  0.8660254
 0.74535599 0.74535599 1.         0.4472136 ]

mean value: 0.7260047126425796

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71428571 0.85714286 0.92857143 1.         0.71428571 0.92857143
 0.85714286 0.85714286 1.         0.71428571]

mean value: 0.8571428571428571

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.71428571 0.85714286 0.93333333 1.         0.75       0.93333333
 0.83333333 0.83333333 1.         0.75      ]

mean value: 0.8604761904761905

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.85714286 0.875      1.         0.66666667 0.875
 1.         1.         1.         0.66666667]

mean value: 0.8654761904761905

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 1.         1.         0.85714286 1.
 0.71428571 0.71428571 1.         0.85714286]

mean value: 0.8714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71428571 0.85714286 0.92857143 1.         0.71428571 0.92857143
 0.85714286 0.85714286 1.         0.71428571]

mean value: 0.8571428571428572

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.55555556 0.75       0.875      1.         0.6        0.875
 0.71428571 0.71428571 1.         0.6       ]

mean value: 0.7684126984126984

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00889707 0.00916862 0.00925422 0.00874209 0.00883031 0.00954938
 0.00880361 0.0087378  0.00891757 0.00876474]

mean value: 0.008966541290283203

key: score_time
value: [0.00904775 0.00855803 0.00931263 0.00873899 0.00855589 0.00873303
 0.00867724 0.00862074 0.00868821 0.008636  ]

mean value: 0.00875685214996338

key: test_mcc
value: [0.42857143 0.57735027 0.42857143 0.63245553 0.31622777 0.8660254
 0.57735027 0.57735027 0.74535599 0.63245553]

mean value: 0.5781713891080292

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71428571 0.78571429 0.71428571 0.78571429 0.64285714 0.92857143
 0.78571429 0.78571429 0.85714286 0.78571429]

mean value: 0.7785714285714286

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.71428571 0.8        0.71428571 0.82352941 0.70588235 0.93333333
 0.8        0.76923077 0.875      0.82352941]

mean value: 0.7959076707606119

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.75       0.71428571 0.7        0.6        0.875
 0.75       0.83333333 0.77777778 0.7       ]

mean value: 0.741468253968254

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 0.71428571 1.         0.85714286 1.
 0.85714286 0.71428571 1.         1.        ]

mean value: 0.8714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71428571 0.78571429 0.71428571 0.78571429 0.64285714 0.92857143
 0.78571429 0.78571429 0.85714286 0.78571429]

mean value: 0.7785714285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.55555556 0.66666667 0.55555556 0.7        0.54545455 0.875
 0.66666667 0.625      0.77777778 0.7       ]

mean value: 0.6667676767676768

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.36

Accuracy on Blind test: 0.7

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.10285902 1.08632183 1.11166525 1.13373566 1.12290573 1.09281826
 1.16202903 1.11271644 1.12481833 1.09064579]

mean value: 1.1140515327453613

key: score_time
value: [0.08765078 0.09237862 0.09297252 0.15036464 0.08707857 0.08621287
 0.08813596 0.09109473 0.09038901 0.08686709]

mean value: 0.09531447887420655

key: test_mcc
value: [0.42857143 0.71428571 1.         0.8660254  0.74535599 0.74535599
 1.         1.         1.         0.4472136 ]

mean value: 0.7946808127141399

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71428571 0.85714286 1.         0.92857143 0.85714286 0.85714286
 1.         1.         1.         0.71428571]

mean value: 0.8928571428571428

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.71428571 0.85714286 1.         0.92307692 0.875      0.875
 1.         1.         1.         0.75      ]

mean value: 0.8994505494505495

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.71428571 0.85714286 1.         1.         0.77777778 0.77777778
 1.         1.         1.         0.66666667]

mean value: 0.8793650793650793

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         0.85714286]

mean value: 0.9285714285714286

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71428571 0.85714286 1.         0.92857143 0.85714286 0.85714286
 1.         1.         1.         0.71428571]

mean value: 0.8928571428571429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.55555556 0.75       1.         0.85714286 0.77777778 0.77777778
 1.         1.         1.         0.6       ]

mean value: 0.8318253968253968

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...05', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.82384944 0.92019367 0.84741855 0.8575809  0.95540094 0.89151525
 0.84008265 0.88904452 0.86206412 0.88456702]

mean value: 0.8771717071533203

key: score_time
value: [0.19765759 0.22397804 0.18344784 0.11897373 0.20639133 0.18086195
 0.21533513 0.23305535 0.18315172 0.21732092]

mean value: 0.19601736068725586

key: test_mcc
value: [0.28867513 0.71428571 0.8660254  0.8660254  0.74535599 0.4472136
 1.         1.         1.         0.57735027]

mean value: 0.7504931513638918

key: train_mcc
value: [0.98425098 1.         0.98425098 0.96825397 0.98425098 0.93650794
 0.98425098 0.98425098 0.98425098 0.98425098]

mean value: 0.9794518794522239

key: test_accuracy
value: [0.64285714 0.85714286 0.92857143 0.92857143 0.85714286 0.71428571
 1.         1.         1.         0.78571429]

mean value: 0.8714285714285714

key: train_accuracy
value: [0.99206349 1.         0.99206349 0.98412698 0.99206349 0.96825397
 0.99206349 0.99206349 0.99206349 0.99206349]

mean value: 0.9896825396825397

key: test_fscore
value: [0.66666667 0.85714286 0.93333333 0.93333333 0.875      0.75
 1.         1.         1.         0.8       ]

mean value: 0.881547619047619

key: train_fscore
value: [0.992      1.         0.992      0.98412698 0.992      0.96825397
 0.992      0.992      0.992      0.992     ]

mean value: 0.9896380952380952

key: test_precision
value: [0.625      0.85714286 0.875      0.875      0.77777778 0.66666667
 1.         1.         1.         0.75      ]

mean value: 0.8426587301587302

key: train_precision
value: [1.         1.         1.         0.98412698 1.         0.96825397
 1.         1.         1.         1.        ]

mean value: 0.9952380952380953

key: test_recall
value: [0.71428571 0.85714286 1.         1.         1.         0.85714286
 1.         1.         1.         0.85714286]

mean value: 0.9285714285714286

key: train_recall
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.96825397
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9841269841269841

key: test_roc_auc
value: [0.64285714 0.85714286 0.92857143 0.92857143 0.85714286 0.71428571
 1.         1.         1.         0.78571429]

mean value: 0.8714285714285714

key: train_roc_auc
value: [0.99206349 1.         0.99206349 0.98412698 0.99206349 0.96825397
 0.99206349 0.99206349 0.99206349 0.99206349]

mean value: 0.9896825396825397

key: test_jcc
value: [0.5        0.75       0.875      0.875      0.77777778 0.6
 1.         1.         1.         0.66666667]

mean value: 0.8044444444444444

key: train_jcc
value: [0.98412698 1.         0.98412698 0.96875    0.98412698 0.93846154
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9796100427350427

MCC on Blind test: 0.8

Accuracy on Blind test: 0.9

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01110482 0.00873327 0.00868821 0.00868058 0.00864291 0.00879645
 0.00917721 0.00874853 0.00868726 0.0091517 ]

mean value: 0.009041094779968261

key: score_time
value: [0.00973678 0.00864267 0.00871468 0.00847268 0.00851536 0.00854301
 0.00959444 0.00855827 0.00854039 0.0086143 ]

mean value: 0.008793258666992187

key: test_mcc
value: [ 0.28867513 -0.1490712   0.4472136   0.42857143  0.          0.31622777
  0.28867513  0.28867513  0.71428571  0.57735027]

mean value: 0.3200602978848017

key: train_mcc
value: [0.66877624 0.71572981 0.68811011 0.67357531 0.71464592 0.68811011
 0.63564173 0.78412547 0.70276422 0.70276422]

mean value: 0.6974243139367098

key: test_accuracy
value: [0.64285714 0.42857143 0.71428571 0.71428571 0.5        0.64285714
 0.64285714 0.64285714 0.85714286 0.78571429]

mean value: 0.6571428571428571

key: train_accuracy
value: [0.83333333 0.85714286 0.84126984 0.83333333 0.85714286 0.84126984
 0.81746032 0.88888889 0.84920635 0.84920635]

mean value: 0.8468253968253968

key: test_fscore
value: [0.66666667 0.33333333 0.75       0.71428571 0.53333333 0.70588235
 0.66666667 0.61538462 0.85714286 0.8       ]

mean value: 0.6642695539754363

key: train_fscore
value: [0.83969466 0.86153846 0.85074627 0.84444444 0.859375   0.85074627
 0.82170543 0.89552239 0.85714286 0.85714286]

mean value: 0.8538058628486893

key: test_precision
value: [0.625      0.4        0.66666667 0.71428571 0.5        0.6
 0.625      0.66666667 0.85714286 0.75      ]

mean value: 0.6404761904761904

key: train_precision
value: [0.80882353 0.8358209  0.8028169  0.79166667 0.84615385 0.8028169
 0.8030303  0.84507042 0.81428571 0.81428571]

mean value: 0.816477089470851

key: test_recall
value: [0.71428571 0.28571429 0.85714286 0.71428571 0.57142857 0.85714286
 0.71428571 0.57142857 0.85714286 0.85714286]

mean value: 0.7

key: train_recall
value: [0.87301587 0.88888889 0.9047619  0.9047619  0.87301587 0.9047619
 0.84126984 0.95238095 0.9047619  0.9047619 ]

mean value: 0.8952380952380953

key: test_roc_auc
value: [0.64285714 0.42857143 0.71428571 0.71428571 0.5        0.64285714
 0.64285714 0.64285714 0.85714286 0.78571429]

mean value: 0.6571428571428571

key: train_roc_auc
value: [0.83333333 0.85714286 0.84126984 0.83333333 0.85714286 0.84126984
 0.81746032 0.88888889 0.84920635 0.84920635]

mean value: 0.8468253968253968

key: test_jcc
value: [0.5        0.2        0.6        0.55555556 0.36363636 0.54545455
 0.5        0.44444444 0.75       0.66666667]

mean value: 0.5125757575757576

key: train_jcc
value: [0.72368421 0.75675676 0.74025974 0.73076923 0.75342466 0.74025974
 0.69736842 0.81081081 0.75       0.75      ]

mean value: 0.7453333567969473

MCC on Blind test: 0.09

Accuracy on Blind test: 0.5

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.05601382 0.04525709 0.04223561 0.04140973 0.04675841 0.04774737
 0.05823922 0.04861522 0.07092834 0.04147816]

mean value: 0.049868297576904294

key: score_time
value: [0.01060367 0.01014566 0.01036358 0.01017857 0.01060271 0.01053643
 0.01062751 0.01058984 0.01028037 0.0101583 ]

mean value: 0.010408663749694824

key: test_mcc
value: [0.63245553 0.8660254  1.         0.8660254  0.8660254  1.
 1.         1.         1.         0.8660254 ]

mean value: 0.9096557147171431

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78571429 0.92857143 1.         0.92857143 0.92857143 1.
 1.         1.         1.         0.92857143]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.92307692 1.         0.92307692 0.93333333 1.
 1.         1.         1.         0.93333333]

mean value: 0.9536349924585219

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7   1.    1.    1.    0.875 1.    1.    1.    1.    0.875]

mean value: 0.945

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78571429 0.92857143 1.         0.92857143 0.92857143 1.
 1.         1.         1.         0.92857143]

mean value: 0.95

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.85714286 1.         0.85714286 0.875      1.
 1.         1.         1.         0.875     ]

mean value: 0.9164285714285714

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.02006388 0.02124453 0.02162552 0.03351927 0.02162671 0.02885532
 0.0461483  0.0459609  0.0458951  0.04586172]

mean value: 0.033080124855041505

key: score_time
value: [0.01186085 0.01175904 0.01173973 0.01177478 0.01200271 0.01466799
 0.01861906 0.02242827 0.02122092 0.02015448]

mean value: 0.015622782707214355

key: test_mcc
value: [0.         0.4472136  0.57735027 0.71428571 0.52223297 0.28867513
 0.4472136  0.71428571 0.8660254  0.31622777]

mean value: 0.4893510161024153

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.5        0.71428571 0.78571429 0.85714286 0.71428571 0.64285714
 0.71428571 0.85714286 0.92857143 0.64285714]

mean value: 0.7357142857142858

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.58823529 0.75       0.8        0.85714286 0.77777778 0.66666667
 0.75       0.85714286 0.93333333 0.70588235]

mean value: 0.7686181139122316

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.66666667 0.75       0.85714286 0.63636364 0.625
 0.66666667 0.85714286 0.875      0.6       ]

mean value: 0.7033982683982684

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.85714286 0.85714286 0.85714286 1.         0.71428571
 0.85714286 0.85714286 1.         0.85714286]

mean value: 0.8571428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.5        0.71428571 0.78571429 0.85714286 0.71428571 0.64285714
 0.71428571 0.85714286 0.92857143 0.64285714]

mean value: 0.7357142857142858

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.41666667 0.6        0.66666667 0.75       0.63636364 0.5
 0.6        0.75       0.875      0.54545455]

mean value: 0.6340151515151515

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.6

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01531363 0.008986   0.00917101 0.00857091 0.00846648 0.00874424
 0.00847507 0.01050377 0.00918341 0.00854468]

mean value: 0.009595918655395507

key: score_time
value: [0.00934434 0.00968599 0.00924802 0.00841284 0.00842881 0.00845051
 0.00878501 0.0092361  0.00860786 0.00842166]

mean value: 0.008862113952636719

key: test_mcc
value: [0.1490712  0.14285714 0.1490712  0.28867513 0.4472136  0.17407766
 0.57735027 0.         0.1490712  0.31622777]

mean value: 0.23936151596140331

key: train_mcc
value: [0.50851338 0.33605377 0.38138504 0.36976727 0.47625048 0.38100038
 0.43052839 0.4612481  0.41400434 0.38332594]

mean value: 0.4142077086835709

key: test_accuracy
value: [0.57142857 0.57142857 0.57142857 0.64285714 0.71428571 0.57142857
 0.78571429 0.5        0.57142857 0.64285714]

mean value: 0.6142857142857143

key: train_accuracy
value: [0.75396825 0.66666667 0.69047619 0.68253968 0.73809524 0.69047619
 0.71428571 0.73015873 0.70634921 0.69047619]

mean value: 0.7063492063492064

key: test_fscore
value: [0.5        0.57142857 0.625      0.61538462 0.66666667 0.66666667
 0.8        0.22222222 0.5        0.70588235]

mean value: 0.5873251095309918

key: train_fscore
value: [0.75968992 0.68656716 0.69767442 0.70588235 0.74015748 0.688
 0.72727273 0.72131148 0.71755725 0.70676692]

mean value: 0.7150879710404706

key: test_precision
value: [0.6        0.57142857 0.55555556 0.66666667 0.8        0.54545455
 0.75       0.5        0.6        0.6       ]

mean value: 0.6189105339105339

key: train_precision
value: [0.74242424 0.64788732 0.68181818 0.65753425 0.734375   0.69354839
 0.69565217 0.74576271 0.69117647 0.67142857]

mean value: 0.696160730965246

key: test_recall
value: [0.42857143 0.57142857 0.71428571 0.57142857 0.57142857 0.85714286
 0.85714286 0.14285714 0.42857143 0.85714286]

mean value: 0.6

key: train_recall
value: [0.77777778 0.73015873 0.71428571 0.76190476 0.74603175 0.68253968
 0.76190476 0.6984127  0.74603175 0.74603175]

mean value: 0.7365079365079366

key: test_roc_auc
value: [0.57142857 0.57142857 0.57142857 0.64285714 0.71428571 0.57142857
 0.78571429 0.5        0.57142857 0.64285714]

mean value: 0.6142857142857143

key: train_roc_auc
value: [0.75396825 0.66666667 0.69047619 0.68253968 0.73809524 0.69047619
 0.71428571 0.73015873 0.70634921 0.69047619]

mean value: 0.7063492063492064

key: test_jcc
value: [0.33333333 0.4        0.45454545 0.44444444 0.5        0.5
 0.66666667 0.125      0.33333333 0.54545455]

mean value: 0.43027777777777776

key: train_jcc
value: [0.6125     0.52272727 0.53571429 0.54545455 0.5875     0.52439024
 0.57142857 0.56410256 0.55952381 0.54651163]

mean value: 0.5569852920760465

MCC on Blind test: 0.82

Accuracy on Blind test: 0.9

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01009989 0.01338983 0.01457953 0.01464343 0.01477599 0.0141077
 0.0144155  0.01425958 0.01422143 0.01673651]

mean value: 0.014122939109802246

key: score_time
value: [0.00842857 0.01145411 0.011446   0.01153588 0.01144266 0.01150966
 0.01145935 0.01150632 0.01145983 0.01159811]

mean value: 0.011184048652648926

key: test_mcc
value: [0.42857143 0.57735027 0.74535599 0.8660254  0.8660254  0.74535599
 0.8660254  0.57735027 1.         0.63245553]

mean value: 0.7304515695337532

key: train_mcc
value: [0.95250095 0.98425098 0.95346259 0.96825397 0.98425098 0.90659109
 0.96874225 0.96825397 0.96825397 0.98425098]

mean value: 0.9638811739095032

key: test_accuracy
value: [0.71428571 0.78571429 0.85714286 0.92857143 0.92857143 0.85714286
 0.92857143 0.78571429 1.         0.78571429]

mean value: 0.8571428571428571

key: train_accuracy
value: [0.97619048 0.99206349 0.97619048 0.98412698 0.99206349 0.95238095
 0.98412698 0.98412698 0.98412698 0.99206349]

mean value: 0.9817460317460317

key: test_fscore
value: [0.71428571 0.76923077 0.875      0.92307692 0.93333333 0.875
 0.93333333 0.8        1.         0.82352941]

mean value: 0.864678948502478

key: train_fscore
value: [0.97637795 0.99212598 0.97674419 0.98412698 0.992      0.95384615
 0.984375   0.98412698 0.98412698 0.992     ]

mean value: 0.9819850229281492

key: test_precision
value: [0.71428571 0.83333333 0.77777778 1.         0.875      0.77777778
 0.875      0.75       1.         0.7       ]

mean value: 0.8303174603174603

key: train_precision
value: [0.96875    0.984375   0.95454545 0.98412698 1.         0.92537313
 0.96923077 0.98412698 0.98412698 1.        ]

mean value: 0.9754655310485534

key: test_recall
value: [0.71428571 0.71428571 1.         0.85714286 1.         1.
 1.         0.85714286 1.         1.        ]

mean value: 0.9142857142857143

key: train_recall
value: [0.98412698 1.         1.         0.98412698 0.98412698 0.98412698
 1.         0.98412698 0.98412698 0.98412698]

mean value: 0.9888888888888888

key: test_roc_auc
value: [0.71428571 0.78571429 0.85714286 0.92857143 0.92857143 0.85714286
 0.92857143 0.78571429 1.         0.78571429]

mean value: 0.8571428571428572

key: train_roc_auc
value: [0.97619048 0.99206349 0.97619048 0.98412698 0.99206349 0.95238095
 0.98412698 0.98412698 0.98412698 0.99206349]

mean value: 0.9817460317460318

key: test_jcc
value: [0.55555556 0.625      0.77777778 0.85714286 0.875      0.77777778
 0.875      0.66666667 1.         0.7       ]

mean value: 0.7709920634920635

key: train_jcc
value: [0.95384615 0.984375   0.95454545 0.96875    0.98412698 0.91176471
 0.96923077 0.96875    0.96875    0.98412698]

mean value: 0.9648266051758698

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01382709 0.01273179 0.0128994  0.01273489 0.01316762 0.01284289
 0.01315284 0.01348805 0.01302576 0.01242471]

mean value: 0.01302950382232666

key: score_time
value: [0.01025319 0.01141953 0.01149654 0.01144791 0.01151443 0.01140189
 0.01157451 0.01143384 0.01143146 0.0115211 ]

mean value: 0.01134943962097168

key: test_mcc
value: [-0.17407766  0.71428571  0.52223297  0.8660254   0.71428571  0.74535599
  0.8660254   0.71428571  0.52223297  0.2773501 ]

mean value: 0.5768002320817054

key: train_mcc
value: [0.78446454 0.98425098 0.69451634 0.96825397 1.         0.95250095
 0.89442719 0.88014083 0.71977239 0.68199434]

mean value: 0.8560321533026929

key: test_accuracy
value: [0.42857143 0.85714286 0.71428571 0.92857143 0.85714286 0.85714286
 0.92857143 0.85714286 0.71428571 0.57142857]

mean value: 0.7714285714285715

key: train_accuracy
value: [0.88095238 0.99206349 0.82539683 0.98412698 1.         0.97619048
 0.94444444 0.93650794 0.84126984 0.81746032]

mean value: 0.9198412698412698

key: test_fscore
value: [0.55555556 0.85714286 0.77777778 0.92307692 0.85714286 0.875
 0.93333333 0.85714286 0.6        0.7       ]

mean value: 0.7936172161172161

key: train_fscore
value: [0.89361702 0.99212598 0.85135135 0.98412698 1.         0.97637795
 0.94736842 0.93220339 0.81132075 0.84563758]

mean value: 0.9234129443255544

key: test_precision
value: [0.45454545 0.85714286 0.63636364 1.         0.85714286 0.77777778
 0.875      0.85714286 1.         0.53846154]

mean value: 0.7853576978576978

key: train_precision
value: [0.80769231 0.984375   0.74117647 0.98412698 1.         0.96875
 0.9        1.         1.         0.73255814]

mean value: 0.9118678901942411

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 0.85714286 1.
 1.         0.85714286 0.42857143 1.        ]

mean value: 0.8571428571428571

key: train_recall
value: [1.         1.         1.         0.98412698 1.         0.98412698
 1.         0.87301587 0.68253968 1.        ]

mean value: 0.9523809523809523

key: test_roc_auc
value: [0.42857143 0.85714286 0.71428571 0.92857143 0.85714286 0.85714286
 0.92857143 0.85714286 0.71428571 0.57142857]

mean value: 0.7714285714285715

key: train_roc_auc
value: [0.88095238 0.99206349 0.82539683 0.98412698 1.         0.97619048
 0.94444444 0.93650794 0.84126984 0.81746032]

mean value: 0.9198412698412698

key: test_jcc
value: [0.38461538 0.75       0.63636364 0.85714286 0.75       0.77777778
 0.875      0.75       0.42857143 0.53846154]

mean value: 0.6747932622932623

key: train_jcc
value: [0.80769231 0.984375   0.74117647 0.96875    1.         0.95384615
 0.9        0.87301587 0.68253968 0.73255814]

mean value: 0.8643953627217136

MCC on Blind test: 0.41

Accuracy on Blind test: 0.6

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10028696 0.08939171 0.09007287 0.08964419 0.08995485 0.09116292
 0.08989072 0.09051132 0.08987451 0.09023666]

mean value: 0.09110267162322998

key: score_time
value: [0.01456475 0.01453018 0.01480341 0.01456952 0.01471496 0.01463842
 0.01459575 0.01482821 0.01475573 0.01459885]

mean value: 0.014659976959228516

key: test_mcc
value: [0.63245553 0.71428571 1.         0.8660254  0.74535599 0.8660254
 1.         1.         1.         0.8660254 ]

mean value: 0.8690173450172636

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78571429 0.85714286 1.         0.92857143 0.85714286 0.92857143
 1.         1.         1.         0.92857143]

mean value: 0.9285714285714286

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.85714286 1.         0.92307692 0.875      0.93333333
 1.         1.         1.         0.93333333]

mean value: 0.9345415858651153

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        0.85714286 1.         1.         0.77777778 0.875
 1.         1.         1.         0.875     ]

mean value: 0.9084920634920635

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78571429 0.85714286 1.         0.92857143 0.85714286 0.92857143
 1.         1.         1.         0.92857143]

mean value: 0.9285714285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.75       1.         0.85714286 0.77777778 0.875
 1.         1.         1.         0.875     ]

mean value: 0.8834920634920634

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03448319 0.02736044 0.02808571 0.05007601 0.04261303 0.04503179
 0.0342381  0.04089355 0.04537654 0.02863741]

mean value: 0.0376795768737793

key: score_time
value: [0.01859689 0.02061486 0.02387714 0.03524613 0.02392983 0.01751757
 0.021662   0.03886604 0.02194452 0.02438903]

mean value: 0.02466440200805664

key: test_mcc
value: [0.63245553 0.8660254  1.         0.8660254  0.8660254  0.63245553
 1.         1.         1.         0.8660254 ]

mean value: 0.8729012679205107

key: train_mcc
value: [1.         0.98425098 1.         1.         0.98425098 1.
 1.         0.98425098 1.         1.        ]

mean value: 0.9952752952754429

key: test_accuracy
value: [0.78571429 0.92857143 1.         0.92857143 0.92857143 0.78571429
 1.         1.         1.         0.92857143]

mean value: 0.9285714285714286

key: train_accuracy
value: [1.         0.99206349 1.         1.         0.99206349 1.
 1.         0.99206349 1.         1.        ]

mean value: 0.9976190476190476

key: test_fscore
value: [0.82352941 0.92307692 1.         0.92307692 0.93333333 0.82352941
 1.         1.         1.         0.93333333]

mean value: 0.9359879336349924

key: train_fscore
value: [1.         0.99212598 1.         1.         0.99212598 1.
 1.         0.992      1.         1.        ]

mean value: 0.9976251968503937

key: test_precision
value: [0.7   1.    1.    1.    0.875 0.7   1.    1.    1.    0.875]

mean value: 0.915

key: train_precision
value: [1.       0.984375 1.       1.       0.984375 1.       1.       1.
 1.       1.      ]

mean value: 0.996875

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9714285714285714

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         0.98412698 1.         1.        ]

mean value: 0.9984126984126984

key: test_roc_auc
value: [0.78571429 0.92857143 1.         0.92857143 0.92857143 0.78571429
 1.         1.         1.         0.92857143]

mean value: 0.9285714285714286

key: train_roc_auc
value: [1.         0.99206349 1.         1.         0.99206349 1.
 1.         0.99206349 1.         1.        ]

mean value: 0.9976190476190476

key: test_jcc
value: [0.7        0.85714286 1.         0.85714286 0.875      0.7
 1.         1.         1.         0.875     ]

mean value: 0.8864285714285715

key: train_jcc
value: [1.         0.984375   1.         1.         0.984375   1.
 1.         0.98412698 1.         1.        ]

mean value: 0.9952876984126984

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.03126049 0.05000067 0.05051756 0.046978   0.04926348 0.0447371
 0.04575491 0.05056047 0.05645704 0.04601097]

mean value: 0.04715406894683838

key: score_time
value: [0.02221918 0.02074003 0.02268863 0.0243206  0.01368237 0.02017021
 0.02368927 0.02083921 0.02071238 0.02281642]

mean value: 0.021187829971313476

key: test_mcc
value: [0.52223297 0.31622777 0.63245553 0.63245553 0.63245553 0.4472136
 0.8660254  0.74535599 1.         0.57735027]

mean value: 0.6371772590958912

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.71428571 0.64285714 0.78571429 0.78571429 0.78571429 0.71428571
 0.92857143 0.85714286 1.         0.78571429]

mean value: 0.8

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.77777778 0.54545455 0.82352941 0.82352941 0.82352941 0.75
 0.93333333 0.83333333 1.         0.8       ]

mean value: 0.8110487225193107

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.63636364 0.75       0.7        0.7        0.7        0.66666667
 0.875      1.         1.         0.75      ]

mean value: 0.7778030303030303

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.42857143 1.         1.         1.         0.85714286
 1.         0.71428571 1.         0.85714286]

mean value: 0.8857142857142857

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.71428571 0.64285714 0.78571429 0.78571429 0.78571429 0.71428571
 0.92857143 0.85714286 1.         0.78571429]

mean value: 0.8

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.63636364 0.375      0.7        0.7        0.7        0.6
 0.875      0.71428571 1.         0.66666667]

mean value: 0.6967316017316018

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.25

Accuracy on Blind test: 0.6

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.22272778 0.25291204 0.25486708 0.21611547 0.24843001 0.24562812
 0.24883819 0.25277472 0.25142169 0.24883938]

mean value: 0.2442554473876953

key: score_time
value: [0.00956607 0.00925088 0.00964093 0.00936627 0.00943518 0.00998449
 0.00920153 0.00911522 0.01006389 0.00919318]

mean value: 0.00948176383972168

key: test_mcc
value: [0.63245553 0.8660254  1.         0.8660254  1.         0.8660254
 1.         1.         1.         0.8660254 ]

mean value: 0.9096557147171431

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78571429 0.92857143 1.         0.92857143 1.         0.92857143
 1.         1.         1.         0.92857143]

mean value: 0.95

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.92307692 1.         0.92307692 1.         0.92307692
 1.         1.         1.         0.93333333]

mean value: 0.9526093514328808

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7   1.    1.    1.    1.    1.    1.    1.    1.    0.875]

mean value: 0.9575

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.85714286 1.         0.85714286 1.         0.85714286
 1.         1.         1.         1.        ]

mean value: 0.9571428571428571

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78571429 0.92857143 1.         0.92857143 1.         0.92857143
 1.         1.         1.         0.92857143]

mean value: 0.95

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.85714286 1.         0.85714286 1.         0.85714286
 1.         1.         1.         0.875     ]

mean value: 0.9146428571428571

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 1.0

Accuracy on Blind test: 1.0

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01456022 0.01583648 0.01625037 0.01607895 0.01611233 0.01637053
 0.01687026 0.0162425  0.01670003 0.01632261]

mean value: 0.016134428977966308

key: score_time
value: [0.01166463 0.01188278 0.01198721 0.01194906 0.01391625 0.01442289
 0.01482201 0.01400399 0.01370358 0.01407051]

mean value: 0.013242292404174804

key: test_mcc
value: [0.74535599 0.52223297 0.74535599 0.74535599 0.74535599 0.74535599
 0.63245553 0.63245553 0.8660254  0.74535599]

mean value: 0.7125305390718464

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85714286 0.71428571 0.85714286 0.85714286 0.85714286 0.85714286
 0.78571429 0.78571429 0.92857143 0.85714286]

mean value: 0.8357142857142856

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.6        0.83333333 0.83333333 0.83333333 0.83333333
 0.72727273 0.72727273 0.92307692 0.83333333]

mean value: 0.7977622377622378

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.42857143 0.71428571 0.71428571 0.71428571 0.71428571
 0.57142857 0.57142857 0.85714286 0.71428571]

mean value: 0.6714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85714286 0.71428571 0.85714286 0.85714286 0.85714286 0.85714286
 0.78571429 0.78571429 0.92857143 0.85714286]

mean value: 0.8357142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.42857143 0.71428571 0.71428571 0.71428571 0.71428571
 0.57142857 0.57142857 0.85714286 0.71428571]

mean value: 0.6714285714285714

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.6

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.03757238 0.03940368 0.0314343  0.03359485 0.03294992 0.03301167
 0.03303957 0.03291583 0.0330925  0.03312182]

mean value: 0.03401365280151367

key: score_time
value: [0.02323055 0.02016068 0.01181483 0.02312255 0.02275276 0.02220821
 0.02225208 0.02244258 0.02231526 0.02217078]

mean value: 0.021247029304504395

key: test_mcc
value: [0.28867513 0.71428571 0.8660254  0.8660254  0.74535599 0.8660254
 0.74535599 0.71428571 1.         0.63245553]

mean value: 0.7438490291553094

key: train_mcc
value: [0.96825397 0.98425098 0.95250095 0.96825397 0.98425098 0.95250095
 0.96825397 0.95250095 0.96825397 0.96825397]

mean value: 0.966727466727708

key: test_accuracy
value: [0.64285714 0.85714286 0.92857143 0.92857143 0.85714286 0.92857143
 0.85714286 0.85714286 1.         0.78571429]

mean value: 0.8642857142857143

key: train_accuracy
value: [0.98412698 0.99206349 0.97619048 0.98412698 0.99206349 0.97619048
 0.98412698 0.97619048 0.98412698 0.98412698]

mean value: 0.9833333333333333

key: test_fscore
value: [0.66666667 0.85714286 0.93333333 0.92307692 0.875      0.93333333
 0.875      0.85714286 1.         0.82352941]

mean value: 0.8744225382460676

key: train_fscore
value: [0.98412698 0.99212598 0.97637795 0.98412698 0.992      0.97637795
 0.98412698 0.97637795 0.98412698 0.98412698]

mean value: 0.9833894763154605

key: test_precision
value: [0.625      0.85714286 0.875      1.         0.77777778 0.875
 0.77777778 0.85714286 1.         0.7       ]

mean value: 0.834484126984127

key: train_precision
value: [0.98412698 0.984375   0.96875    0.98412698 1.         0.96875
 0.98412698 0.96875    0.98412698 0.98412698]

mean value: 0.981125992063492

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 1.         1.
 1.         0.85714286 1.         1.        ]

mean value: 0.9285714285714286

key: train_recall
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9857142857142857

key: test_roc_auc
value: [0.64285714 0.85714286 0.92857143 0.92857143 0.85714286 0.92857143
 0.85714286 0.85714286 1.         0.78571429]

mean value: 0.8642857142857143

key: train_roc_auc
value: [0.98412698 0.99206349 0.97619048 0.98412698 0.99206349 0.97619048
 0.98412698 0.97619048 0.98412698 0.98412698]

mean value: 0.9833333333333334

key: test_jcc
value: [0.5        0.75       0.875      0.85714286 0.77777778 0.875
 0.77777778 0.75       1.         0.7       ]

mean value: 0.7862698412698412

key: train_jcc
value: [0.96875    0.984375   0.95384615 0.96875    0.98412698 0.95384615
 0.96875    0.95384615 0.96875    0.96875   ]

mean value: 0.9673790445665446

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'rsa',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=167)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.23806024 0.24101663 0.23600936 0.27989411 0.26349449 0.21895146
 0.12900424 0.16086125 0.30366659 0.30462027]

mean value: 0.23755786418914795

key: score_time
value: [0.0202539  0.02047062 0.02365375 0.02144051 0.021348   0.0231626
 0.01186275 0.02951169 0.02209163 0.016994  ]

mean value: 0.021078944206237793

key: test_mcc
value: [0.28867513 0.71428571 0.8660254  0.8660254  0.74535599 0.8660254
 0.74535599 0.71428571 1.         0.63245553]

mean value: 0.7438490291553094

key: train_mcc
value: [0.96825397 0.98425098 0.95250095 0.96825397 0.98425098 0.95250095
 0.96825397 0.95250095 0.96825397 0.96825397]

mean value: 0.966727466727708

key: test_accuracy
value: [0.64285714 0.85714286 0.92857143 0.92857143 0.85714286 0.92857143
 0.85714286 0.85714286 1.         0.78571429]

mean value: 0.8642857142857143

key: train_accuracy
value: [0.98412698 0.99206349 0.97619048 0.98412698 0.99206349 0.97619048
 0.98412698 0.97619048 0.98412698 0.98412698]

mean value: 0.9833333333333333

key: test_fscore
value: [0.66666667 0.85714286 0.93333333 0.92307692 0.875      0.93333333
 0.875      0.85714286 1.         0.82352941]

mean value: 0.8744225382460676

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:188: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_sl.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.98412698 0.99212598 0.97637795 0.98412698 0.992      0.97637795
 0.98412698 0.97637795 0.98412698 0.98412698]

mean value: 0.9833894763154605

key: test_precision
value: [0.625      0.85714286 0.875      1.         0.77777778 0.875
 0.77777778 0.85714286 1.         0.7       ]

mean value: 0.834484126984127

key: train_precision
value: [0.98412698 0.984375   0.96875    0.98412698 1.         0.96875
 0.98412698 0.96875    0.98412698 0.98412698]

mean value: 0.981125992063492

key: test_recall
value: [0.71428571 0.85714286 1.         0.85714286 1.         1.
 1.         0.85714286 1.         1.        ]

mean value: 0.9285714285714286

key: train_recall
value: [0.98412698 1.         0.98412698 0.98412698 0.98412698 0.98412698
 0.98412698 0.98412698 0.98412698 0.98412698]

mean value: 0.9857142857142857

key: test_roc_auc
value: [0.64285714 0.85714286 0.92857143 0.92857143 0.85714286 0.92857143
 0.85714286 0.85714286 1.         0.78571429]

mean value: 0.8642857142857143

key: train_roc_auc
value: [0.98412698 0.99206349 0.97619048 0.98412698 0.99206349 0.97619048
 0.98412698 0.97619048 0.98412698 0.98412698]

mean value: 0.9833333333333334

key: test_jcc
value: [0.5        0.75       0.875      0.85714286 0.77777778 0.875
 0.77777778 0.75       1.         0.7       ]

mean value: 0.7862698412698412

key: train_jcc
value: [0.96875    0.984375   0.95384615 0.96875    0.98412698 0.95384615
 0.96875    0.95384615 0.96875    0.96875   ]

mean value: 0.9673790445665446

MCC on Blind test: 0.58

Accuracy on Blind test: 0.8