LSHTM_analysis/scripts/ml/log_pnca_orig.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_orig.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 424

PASS: my_features_df and aa_df successfully combined
nrows: 424
ncols: 265
count of NULL values before imputation

or_mychisq          102
log10_or_mychisq    102
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

Total no. of features for aaindex: 123

No. of numerical features: 166
No. of categorical features: 7

index: 0
ind: 1

Mask count check: True
Original Data
 Counter({1: 114, 0: 71}) Data dim: (185, 173)

-------------------------------------------------------------
Successfully split data: ORIGINAL training
actual values: training set
imputed values: blind test set
Train data size: (185, 173)
Test data size: (239, 173)
y_train numbers: Counter({1: 114, 0: 71})
y_train ratio: 0.6228070175438597

y_test_numbers: Counter({0: 120, 1: 119})
y_test ratio: 1.0084033613445378
-------------------------------------------------------------
Simple Random OverSampling
 Counter({0: 114, 1: 114})
(228, 173)
Simple Random UnderSampling
 Counter({0: 71, 1: 71})
(142, 173)
Simple Combined Over and UnderSampling
 Counter({0: 114, 1: 114})
(228, 173)
SMOTE_NC OverSampling
 Counter({0: 114, 1: 114})
(228, 173)

#####################################################################

Running ML analysis: ORIGINAL
Gene name: pncA
Drug name: pyrazinamide

Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_orig/

Sanity checks:
Total input features: 173

Training data size: (185, 173)
Test data size: (239, 173)

Target feature numbers (training data): Counter({1: 114, 0: 71})
Target features ratio (training data: 0.6228070175438597

Target feature numbers (test data): Counter({0: 120, 1: 119})
Target features ratio (test data): 1.0084033613445378

#####################################################################


================================================================

Strucutral features (n): 34
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

AAindex features (n): 123
These are:
 ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03524065 0.031986   0.03195024 0.03289199 0.02481055 0.05734158
 0.06366634 0.04798388 0.02917767 0.03274918]

mean value: 0.038779807090759275

key: score_time
value: [0.01227498 0.01202679 0.01314306 0.01324391 0.0123229  0.01349568
 0.01344991 0.01196861 0.01190662 0.01218939]

mean value: 0.01260218620300293

key: test_mcc
value: [0.33796318 0.54761905 0.0952381  0.77380952 0.65477023 0.53246753
 0.89188259 0.12182898 0.2548236  0.2987013 ]

mean value: 0.45091040737717836

key: train_mcc
value: [0.83287487 0.78705463 0.79925792 0.81149011 0.76271746 0.81037732
 0.8120727  0.82431059 0.82431059 0.84779256]

mean value: 0.8112258763592134

key: test_accuracy
value: [0.68421053 0.78947368 0.57894737 0.89473684 0.84210526 0.77777778
 0.94444444 0.61111111 0.66666667 0.66666667]

mean value: 0.7456140350877193

key: train_accuracy
value: [0.92168675 0.89759036 0.90361446 0.90963855 0.88554217 0.91017964
 0.91017964 0.91616766 0.91616766 0.92814371]

mean value: 0.909891061250992

key: test_fscore
value: [0.75       0.83333333 0.66666667 0.91666667 0.88       0.81818182
 0.95238095 0.72       0.76923077 0.72727273]

mean value: 0.8033732933732933

key: train_fscore
value: [0.93838863 0.92165899 0.92592593 0.93023256 0.91324201 0.93023256
 0.93087558 0.93518519 0.93518519 0.94339623]

mean value: 0.9304322835927279

key: test_precision
value: [0.69230769 0.83333333 0.66666667 0.91666667 0.84615385 0.81818182
 1.         0.64285714 0.66666667 0.72727273]

mean value: 0.781010656010656

key: train_precision
value: [0.91666667 0.86956522 0.87719298 0.88495575 0.85470085 0.89285714
 0.88596491 0.89380531 0.89380531 0.91743119]

mean value: 0.8886945340694777

key: test_recall
value: [0.81818182 0.83333333 0.66666667 0.91666667 0.91666667 0.81818182
 0.90909091 0.81818182 0.90909091 0.72727273]

mean value: 0.8333333333333334

key: train_recall
value: [0.96116505 0.98039216 0.98039216 0.98039216 0.98039216 0.97087379
 0.98058252 0.98058252 0.98058252 0.97087379]

mean value: 0.9766228821625738

key: test_roc_auc
value: [0.65909091 0.77380952 0.54761905 0.88690476 0.81547619 0.76623377
 0.95454545 0.55194805 0.5974026  0.64935065]

mean value: 0.7202380952380952

key: train_roc_auc
value: [0.90915395 0.87300858 0.88082108 0.88863358 0.85738358 0.89168689
 0.88872876 0.89654126 0.89654126 0.91512439]

mean value: 0.8897623339384297

key: test_jcc
value: [0.6        0.71428571 0.5        0.84615385 0.78571429 0.69230769
 0.90909091 0.5625     0.625      0.57142857]

mean value: 0.6806481018981019

key: train_jcc
value: [0.88392857 0.85470085 0.86206897 0.86956522 0.84033613 0.86956522
 0.87068966 0.87826087 0.87826087 0.89285714]

mean value: 0.870023349804305

MCC on Blind test: 0.42

Accuracy on Blind test: 0.7

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.93467402 1.02162886 0.67857623 0.93024802 0.72668695 0.85950851
 0.93305063 0.77515554 0.76321864 1.07244253]

mean value: 0.8695189952850342

key: score_time
value: [0.01315522 0.01336765 0.01328516 0.01339579 0.0135901  0.01321673
 0.01317739 0.01686883 0.01603985 0.01210642]

mean value: 0.013820314407348632

key: test_mcc
value: [0.60553007 0.45361105 0.67460105 0.80507649 0.77380952 0.66254135
 0.56407607 0.64465837 0.44320263 0.2987013 ]

mean value: 0.5925807909458823

key: train_mcc
value: [1.         1.         1.         1.         1.         0.98737524
 1.         0.98737524 1.         0.91120799]

mean value: 0.9885958461530414

key: test_accuracy
value: [0.78947368 0.73684211 0.84210526 0.89473684 0.89473684 0.83333333
 0.72222222 0.83333333 0.72222222 0.66666667]

mean value: 0.7935672514619883

key: train_accuracy
value: [1.         1.         1.         1.         1.         0.99401198
 1.         0.99401198 1.         0.95808383]

mean value: 0.9946107784431137

key: test_fscore
value: [0.84615385 0.7826087  0.86956522 0.90909091 0.91666667 0.85714286
 0.70588235 0.86956522 0.81481481 0.72727273]

mean value: 0.829876330451778

key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
train_fscore
value: [1.         1.         1.         1.         1.         0.99516908
 1.         0.99516908 1.         0.96650718]

mean value: 0.99568453412847

key: test_precision
value: [0.73333333 0.81818182 0.90909091 1.         0.91666667 0.9
 1.         0.83333333 0.6875     0.72727273]

mean value: 0.8525378787878788

key: train_precision
value: [1.         1.         1.         1.         1.         0.99038462
 1.         0.99038462 1.         0.95283019]

mean value: 0.9933599419448476

key: test_recall
value: [1.         0.75       0.83333333 0.83333333 0.91666667 0.81818182
 0.54545455 0.90909091 1.         0.72727273]

mean value: 0.8333333333333334

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.98058252]

mean value: 0.9980582524271845

key: test_roc_auc
value: [0.75       0.73214286 0.8452381  0.91666667 0.88690476 0.83766234
 0.77272727 0.81168831 0.64285714 0.64935065]

mean value: 0.7845238095238095

key: train_roc_auc
value: [1.         1.         1.         1.         1.         0.9921875
 1.         0.9921875  1.         0.95122876]

mean value: 0.9935603762135923

key: test_jcc
value: [0.73333333 0.64285714 0.76923077 0.83333333 0.84615385 0.75
 0.54545455 0.76923077 0.6875     0.57142857]

mean value: 0.7148522311022311

key: train_jcc
value: [1.         1.         1.         1.         1.         0.99038462
 1.         0.99038462 1.         0.93518519]

mean value: 0.9915954415954416

MCC on Blind test: 0.23

Accuracy on Blind test: 0.61

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01257062 0.01140189 0.00913525 0.00892401 0.00878763 0.00891304
 0.00894451 0.00909281 0.00889635 0.00975704]

mean value: 0.009642314910888673

key: score_time
value: [0.01189208 0.00911927 0.00909996 0.00931287 0.00870299 0.0086081
 0.00883341 0.00891304 0.00927591 0.00955582]

mean value: 0.009331345558166504

key: test_mcc
value: [0.34405118 0.26772484 0.03912304 0.40849122 0.14085904 0.26856633
 0.2987013  0.06493506 0.56061191 0.40291148]

mean value: 0.2795975400517249

key: train_mcc
value: [0.57098929 0.35088235 0.40877514 0.55947749 0.40877514 0.55309666
 0.46678391 0.53583369 0.49453247 0.45408591]

mean value: 0.4803232036222982

key: test_accuracy
value: [0.68421053 0.68421053 0.57894737 0.73684211 0.63157895 0.66666667
 0.66666667 0.55555556 0.77777778 0.72222222]

mean value: 0.67046783625731

key: train_accuracy
value: [0.80120482 0.70481928 0.72891566 0.79518072 0.72891566 0.79041916
 0.75449102 0.78443114 0.76646707 0.74850299]

mean value: 0.7603347521823822

key: test_fscore
value: [0.76923077 0.78571429 0.69230769 0.81481481 0.74074074 0.75
 0.72727273 0.63636364 0.84615385 0.7826087 ]

mean value: 0.7545207208250686

key: train_fscore
value: [0.84507042 0.79324895 0.79638009 0.84545455 0.79638009 0.83253589
 0.81278539 0.8317757  0.8202765  0.80733945]

mean value: 0.8181247015599945

key: test_precision
value: [0.66666667 0.6875     0.64285714 0.73333333 0.66666667 0.69230769
 0.72727273 0.63636364 0.73333333 0.75      ]

mean value: 0.6936301198801199

key: train_precision
value: [0.81818182 0.6962963  0.7394958  0.78813559 0.7394958  0.82075472
 0.76724138 0.8018018  0.78070175 0.76521739]

mean value: 0.7717322348120701

key: test_recall
value: [0.90909091 0.91666667 0.75       0.91666667 0.83333333 0.81818182
 0.72727273 0.63636364 1.         0.81818182]

mean value: 0.8325757575757575

key: train_recall
value: [0.87378641 0.92156863 0.8627451  0.91176471 0.8627451  0.84466019
 0.86407767 0.86407767 0.86407767 0.85436893]

mean value: 0.8723872073101084

key: test_roc_auc
value: [0.64204545 0.60119048 0.51785714 0.67261905 0.55952381 0.62337662
 0.64935065 0.53246753 0.71428571 0.69480519]

mean value: 0.6207521645021645

key: train_roc_auc
value: [0.77816305 0.64047181 0.68918505 0.76056985 0.68918505 0.7738926
 0.72110133 0.76016383 0.73672633 0.71624697]

mean value: 0.7265705877820383

key: test_jcc
value: [0.625      0.64705882 0.52941176 0.6875     0.58823529 0.6
 0.57142857 0.46666667 0.73333333 0.64285714]

mean value: 0.6091491596638655

key: train_jcc
value: [0.73170732 0.65734266 0.66165414 0.73228346 0.66165414 0.71311475
 0.68461538 0.712      0.6953125  0.67692308]

mean value: 0.6926607425296271

MCC on Blind test: 0.45

Accuracy on Blind test: 0.71

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01034927 0.00944018 0.00949478 0.00973845 0.00962996 0.00887012
 0.0089128  0.00977898 0.00905228 0.00937176]

mean value: 0.009463858604431153

key: score_time
value: [0.00965571 0.00940299 0.00865912 0.00867152 0.009197   0.00870323
 0.00895262 0.0092957  0.00875807 0.00866055]

mean value: 0.008995652198791504

key: test_mcc
value: [ 0.23262105  0.23262105 -0.01163105  0.28690229  0.32142857  0.34188173
 -0.02548236 -0.32232919 -0.16883117  0.43320011]

mean value: 0.1320381044112035

key: train_mcc
value: [0.38992541 0.37624725 0.38970588 0.37720787 0.42954422 0.36848818
 0.4353138  0.48789999 0.33479889 0.37453283]

mean value: 0.39636643214511924

key: test_accuracy
value: [0.63157895 0.63157895 0.47368421 0.68421053 0.68421053 0.66666667
 0.5        0.38888889 0.44444444 0.72222222]

mean value: 0.5827485380116959

key: train_accuracy
value: [0.71084337 0.69879518 0.71084337 0.71084337 0.72891566 0.69461078
 0.73053892 0.76047904 0.68263473 0.7005988 ]

mean value: 0.7129103239304524

key: test_fscore
value: [0.69565217 0.69565217 0.5        0.76923077 0.75       0.7
 0.57142857 0.52173913 0.54545455 0.76190476]

mean value: 0.6511062126279518

key: train_fscore
value: [0.76470588 0.74747475 0.76470588 0.77358491 0.77832512 0.74371859
 0.77832512 0.80952381 0.73891626 0.75247525]

mean value: 0.7651755570317448

key: test_precision
value: [0.66666667 0.72727273 0.625      0.71428571 0.75       0.77777778
 0.6        0.5        0.54545455 0.8       ]

mean value: 0.6706457431457431

key: train_precision
value: [0.77227723 0.77083333 0.76470588 0.74545455 0.78217822 0.77083333
 0.79       0.79439252 0.75       0.76767677]

mean value: 0.7708351831059962

key: test_recall
value: [0.72727273 0.66666667 0.41666667 0.83333333 0.75       0.63636364
 0.54545455 0.54545455 0.54545455 0.72727273]

mean value: 0.6393939393939394

key: train_recall
value: [0.75728155 0.7254902  0.76470588 0.80392157 0.7745098  0.7184466
 0.76699029 0.82524272 0.72815534 0.73786408]

mean value: 0.7602608033504664

key: test_roc_auc
value: [0.61363636 0.61904762 0.49404762 0.63095238 0.66071429 0.67532468
 0.48701299 0.34415584 0.41558442 0.72077922]

mean value: 0.5661255411255411

key: train_roc_auc
value: [0.69610109 0.6908701  0.69485294 0.68321078 0.7153799  0.6873483
 0.71943265 0.74074636 0.66876517 0.68924454]

mean value: 0.6985951834212649

key: test_jcc
value: [0.53333333 0.53333333 0.33333333 0.625      0.6        0.53846154
 0.4        0.35294118 0.375      0.61538462]

mean value: 0.4906787330316742

key: train_jcc
value: [0.61904762 0.59677419 0.61904762 0.63076923 0.63709677 0.592
 0.63709677 0.68       0.5859375  0.6031746 ]

mean value: 0.6200944313974556

MCC on Blind test: 0.45

Accuracy on Blind test: 0.72

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01021385 0.0117085  0.00883484 0.00892735 0.0085423  0.00943303
 0.00997519 0.00961924 0.00955629 0.00957227]

mean value: 0.009638285636901856

key: score_time
value: [0.05058432 0.03134537 0.01030421 0.00968504 0.01054215 0.01008773
 0.01054263 0.01056457 0.01067567 0.01038671]

mean value: 0.01647183895111084

key: test_mcc
value: [-0.07954545 -0.26196842 -0.28414557 -0.33071891 -0.20865621 -0.16883117
 -0.05096472 -0.0805823  -0.42640143  0.26856633]

mean value: -0.1623247858043403

key: train_mcc
value: [0.40149161 0.42213076 0.37917381 0.40791958 0.42567075 0.39903847
 0.35572255 0.39451676 0.39528332 0.41049956]

mean value: 0.399144718616529

key: test_accuracy
value: [0.47368421 0.52631579 0.42105263 0.47368421 0.47368421 0.44444444
 0.55555556 0.5        0.38888889 0.66666667]

mean value: 0.49239766081871345

key: train_accuracy
value: [0.72891566 0.73493976 0.71686747 0.72891566 0.73493976 0.7245509
 0.70658683 0.7245509  0.7245509  0.73053892]

mean value: 0.7255356756366784

key: test_fscore
value: [0.54545455 0.68965517 0.56       0.64285714 0.61538462 0.54545455
 0.69230769 0.60869565 0.56       0.75      ]

mean value: 0.6209809366046247

key: train_fscore
value: [0.80176211 0.8018018  0.79111111 0.79820628 0.7962963  0.79090909
 0.78026906 0.8        0.79646018 0.79820628]

mean value: 0.7955022205996671

key: test_precision
value: [0.54545455 0.58823529 0.53846154 0.5625     0.57142857 0.54545455
 0.6        0.58333333 0.5        0.69230769]

mean value: 0.5727175520557873

key: train_precision
value: [0.73387097 0.74166667 0.72357724 0.73553719 0.75438596 0.74358974
 0.725      0.72440945 0.73170732 0.74166667]

mean value: 0.7355411201324364

key: test_recall
value: [0.54545455 0.83333333 0.58333333 0.75       0.66666667 0.54545455
 0.81818182 0.63636364 0.63636364 0.81818182]

mean value: 0.6833333333333333

key: train_recall
value: [0.88349515 0.87254902 0.87254902 0.87254902 0.84313725 0.84466019
 0.84466019 0.89320388 0.87378641 0.86407767]

mean value: 0.8664667808871122

key: test_roc_auc
value: [0.46022727 0.41666667 0.36309524 0.375      0.4047619  0.41558442
 0.48051948 0.46103896 0.31818182 0.62337662]

mean value: 0.4318452380952381

key: train_roc_auc
value: [0.67984281 0.69408701 0.67064951 0.68627451 0.70281863 0.6879551
 0.6645176  0.67316444 0.6790807  0.68985133]

mean value: 0.6828241642530799

key: test_jcc
value: [0.375      0.52631579 0.38888889 0.47368421 0.44444444 0.375
 0.52941176 0.4375     0.38888889 0.6       ]

mean value: 0.45391339869281044

key: train_jcc
value: [0.66911765 0.66917293 0.65441176 0.6641791  0.66153846 0.65413534
 0.63970588 0.66666667 0.66176471 0.6641791 ]

mean value: 0.6604871607837044

MCC on Blind test: 0.08

Accuracy on Blind test: 0.54

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0124259  0.0118289  0.01121783 0.01115513 0.01249623 0.01109719
 0.01236582 0.01226473 0.01196218 0.01111054]

mean value: 0.011792445182800293

key: score_time
value: [0.01009512 0.00946808 0.00996041 0.0101192  0.00939679 0.01001549
 0.00991511 0.0099535  0.00997281 0.00915146]

mean value: 0.009804797172546387

key: test_mcc
value: [ 0.40219983  0.26772484  0.3086067   0.3086067   0.3086067   0.39594419
  0.39594419 -0.05096472  0.3040345   0.39594419]

mean value: 0.303664710755213

key: train_mcc
value: [0.5635375  0.54404241 0.59782919 0.56865593 0.53158234 0.54476067
 0.53640723 0.58634752 0.54476067 0.59862298]

mean value: 0.5616546427256583

key: test_accuracy
value: [0.68421053 0.68421053 0.68421053 0.68421053 0.68421053 0.72222222
 0.72222222 0.55555556 0.66666667 0.72222222]

mean value: 0.6809941520467836

key: train_accuracy
value: [0.78313253 0.77108434 0.80120482 0.78313253 0.76506024 0.77245509
 0.77245509 0.79640719 0.77245509 0.80239521]

mean value: 0.7819782122501984

key: test_fscore
value: [0.78571429 0.78571429 0.8        0.8        0.8        0.8
 0.8        0.69230769 0.78571429 0.8       ]

mean value: 0.784945054945055

key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[0.85123967 0.84297521 0.85957447 0.85       0.83950617 0.8442623
 0.84297521 0.85714286 0.8442623  0.86075949]

mean value: 0.8492697664546919

key: test_precision
value: [0.64705882 0.6875     0.66666667 0.66666667 0.66666667 0.71428571
 0.71428571 0.6        0.64705882 0.71428571]

mean value: 0.6724474789915966

key: train_precision
value: [0.74100719 0.72857143 0.7593985  0.73913043 0.72340426 0.73049645
 0.73381295 0.75555556 0.73049645 0.76119403]

mean value: 0.74030672520064

key: test_recall
value: [1.         0.91666667 1.         1.         1.         0.90909091
 0.90909091 0.81818182 1.         0.90909091]

mean value: 0.9462121212121212

key: train_recall
value: [1.         1.         0.99019608 1.         1.         1.
 0.99029126 0.99029126 1.         0.99029126]

mean value: 0.9961069864839139

key: test_roc_auc
value: [0.625      0.60119048 0.57142857 0.57142857 0.57142857 0.66883117
 0.66883117 0.48051948 0.57142857 0.66883117]

mean value: 0.5998917748917749

key: train_roc_auc
value: [0.71428571 0.703125   0.74509804 0.71875    0.6953125  0.703125
 0.70608313 0.73733313 0.703125   0.74514563]

mean value: 0.7171383146705284

key: test_jcc
value: [0.64705882 0.64705882 0.66666667 0.66666667 0.66666667 0.66666667
 0.66666667 0.52941176 0.64705882 0.66666667]

mean value: 0.6470588235294118

key: train_jcc
value: [0.74100719 0.72857143 0.75373134 0.73913043 0.72340426 0.73049645
 0.72857143 0.75       0.73049645 0.75555556]

mean value: 0.7380964548129775

MCC on Blind test: 0.37

Accuracy on Blind test: 0.64

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.14798498 0.80445504 0.68168712 0.778126   0.67166209 0.71176672
 0.82175732 0.63936996 0.6600368  0.79345798]

mean value: 0.7710304021835327

key: score_time
value: [0.01561236 0.01476002 0.01496482 0.01485348 0.01530099 0.01830029
 0.01525712 0.01215148 0.0121603  0.02092552]

mean value: 0.015428638458251953

key: test_mcc
value: [ 0.33796318  0.18531233  0.32142857  0.32142857  0.45361105  0.64465837
  0.79772404 -0.0805823   0.01413507  0.40291148]

mean value: 0.33985903712040866

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.68421053 0.63157895 0.68421053 0.68421053 0.73684211 0.83333333
 0.88888889 0.5        0.55555556 0.72222222]

mean value: 0.6921052631578948

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.72       0.75       0.75       0.7826087  0.86956522
 0.9        0.60869565 0.66666667 0.7826087 ]

mean value: 0.7580144927536232

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.69230769 0.69230769 0.75       0.75       0.81818182 0.83333333
 1.         0.58333333 0.61538462 0.75      ]

mean value: 0.7484848484848485

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.75       0.75       0.75       0.75       0.90909091
 0.81818182 0.63636364 0.72727273 0.81818182]

mean value: 0.7727272727272727

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.65909091 0.58928571 0.66071429 0.66071429 0.73214286 0.81168831
 0.90909091 0.46103896 0.50649351 0.69480519]

mean value: 0.6685064935064935

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.5625     0.6        0.6        0.64285714 0.76923077
 0.81818182 0.4375     0.5        0.64285714]

mean value: 0.6173126873126873

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.26

Accuracy on Blind test: 0.63

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.0201478  0.01317859 0.01518607 0.01245022 0.01219487 0.01223302
 0.01228309 0.01388884 0.01354384 0.01307154]

mean value: 0.013817787170410156

key: score_time
value: [0.01178908 0.00907445 0.00897789 0.00866628 0.00874162 0.00875974
 0.00865197 0.008744   0.00954509 0.00889468]

mean value: 0.009184479713439941

key: test_mcc
value: [0.45361105 1.         0.65133895 0.80507649 0.54761905 0.66254135
 0.79772404 0.26856633 0.88640526 0.53246753]

mean value: 0.6605350037589758

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73684211 1.         0.78947368 0.89473684 0.78947368 0.83333333
 0.88888889 0.66666667 0.94444444 0.77777778]

mean value: 0.8321637426900584

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7826087  1.         0.8        0.90909091 0.83333333 0.85714286
 0.9        0.75       0.95652174 0.81818182]

mean value: 0.8606879352531527

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       1.         1.         1.         0.83333333 0.9
 1.         0.69230769 0.91666667 0.81818182]

mean value: 0.8910489510489511

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.66666667 0.83333333 0.83333333 0.81818182
 0.81818182 0.81818182 1.         0.81818182]

mean value: 0.8424242424242424

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.72159091 1.         0.83333333 0.91666667 0.77380952 0.83766234
 0.90909091 0.62337662 0.92857143 0.76623377]

mean value: 0.8310335497835498

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.64285714 1.         0.66666667 0.83333333 0.71428571 0.75
 0.81818182 0.6        0.91666667 0.69230769]

mean value: 0.7634299034299035

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.52

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.09460139 0.09494543 0.09293103 0.09291887 0.09456015 0.09315991
 0.09335065 0.09217739 0.09356213 0.10142875]

mean value: 0.09436357021331787

key: score_time
value: [0.01741433 0.01736999 0.01729822 0.01721883 0.01758862 0.01811028
 0.01707053 0.01714253 0.01738429 0.01856899]

mean value: 0.017516660690307616

key: test_mcc
value: [0.08257228 0.53468154 0.1495142  0.42004128 0.32142857 0.39594419
 0.76623377 0.39594419 0.2548236  0.20385888]

mean value: 0.35250424955725035

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.57894737 0.78947368 0.57894737 0.73684211 0.68421053 0.72222222
 0.88888889 0.72222222 0.66666667 0.61111111]

mean value: 0.697953216374269

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.69230769 0.84615385 0.63636364 0.8        0.75       0.8
 0.90909091 0.8        0.76923077 0.66666667]

mean value: 0.766981351981352

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.6        0.78571429 0.7        0.76923077 0.75       0.71428571
 0.90909091 0.71428571 0.66666667 0.7       ]

mean value: 0.7309274059274059

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.91666667 0.58333333 0.83333333 0.75       0.90909091
 0.90909091 0.90909091 0.90909091 0.63636364]

mean value: 0.8174242424242424

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.53409091 0.74404762 0.57738095 0.70238095 0.66071429 0.66883117
 0.88311688 0.66883117 0.5974026  0.6038961 ]

mean value: 0.664069264069264

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.52941176 0.73333333 0.46666667 0.66666667 0.6        0.66666667
 0.83333333 0.66666667 0.625      0.5       ]

mean value: 0.6287745098039216

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.32

Accuracy on Blind test: 0.65

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01018333 0.0099957  0.00997639 0.00996017 0.01004481 0.00998259
 0.00940919 0.00883722 0.00895667 0.00889516]

mean value: 0.009624123573303223

key: score_time
value: [0.00936508 0.00946832 0.00939035 0.00943542 0.00943351 0.00934362
 0.00854778 0.00865889 0.0086751  0.00854993]

mean value: 0.009086799621582032

key: test_mcc
value: [ 0.25844328  0.0952381   0.13095238  0.1495142   0.32142857  0.43320011
  0.48416483 -0.0805823   0.64465837  0.40291148]

mean value: 0.2839929034172316

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.63157895 0.57894737 0.52631579 0.57894737 0.68421053 0.72222222
 0.72222222 0.5        0.83333333 0.72222222]

mean value: 0.65

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.66666667 0.52631579 0.63636364 0.75       0.76190476
 0.73684211 0.60869565 0.86956522 0.7826087 ]

mean value: 0.7005629191555965

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        0.66666667 0.71428571 0.7        0.75       0.8
 0.875      0.58333333 0.83333333 0.75      ]

mean value: 0.7372619047619048

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 0.66666667 0.41666667 0.58333333 0.75       0.72727273
 0.63636364 0.63636364 0.90909091 0.81818182]

mean value: 0.678030303030303

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.63068182 0.54761905 0.56547619 0.57738095 0.66071429 0.72077922
 0.74675325 0.46103896 0.81168831 0.69480519]

mean value: 0.641693722943723

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.5        0.35714286 0.46666667 0.6        0.61538462
 0.58333333 0.4375     0.76923077 0.64285714]

mean value: 0.5472115384615385

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.25

Accuracy on Blind test: 0.62

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.21576595 1.23462939 1.23632216 1.24061584 1.27298141 1.25783706
 1.2260108  1.21766496 1.2200973  1.23109865]

mean value: 1.2353023529052733

key: score_time
value: [0.08846188 0.09100533 0.09566069 0.15491486 0.09412932 0.08944058
 0.08805823 0.09122467 0.09319806 0.094805  ]

mean value: 0.09808986186981201

key: test_mcc
value: [0.60553007 0.88949918 0.67460105 0.65477023 0.56694671 0.64465837
 0.89188259 0.39594419 0.77742884 0.52299758]

mean value: 0.6624258812978574

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78947368 0.94736842 0.84210526 0.84210526 0.78947368 0.83333333
 0.94444444 0.72222222 0.88888889 0.77777778]

mean value: 0.837719298245614

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.84615385 0.96       0.86956522 0.88       0.85714286 0.86956522
 0.95238095 0.8        0.91666667 0.83333333]

mean value: 0.8784808090460264

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.73333333 0.92307692 0.90909091 0.84615385 0.75       0.83333333
 1.         0.71428571 0.84615385 0.76923077]

mean value: 0.8324658674658675

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.83333333 0.91666667 1.         0.90909091
 0.90909091 0.90909091 1.         0.90909091]

mean value: 0.9386363636363636

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.92857143 0.8452381  0.81547619 0.71428571 0.81168831
 0.95454545 0.66883117 0.85714286 0.74025974]

mean value: 0.8086038961038962

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.73333333 0.92307692 0.76923077 0.78571429 0.75       0.76923077
 0.90909091 0.66666667 0.84615385 0.71428571]

mean value: 0.7866783216783216

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.29

Accuracy on Blind test: 0.62

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.74438    0.86402607 0.87671947 0.88783979 0.97754407 0.87419868
 0.87052846 0.87867832 0.88394403 0.91006446]

mean value: 0.9767923355102539

key: score_time
value: [0.22032857 0.17637062 0.18659782 0.2488842  0.18379068 0.22443295
 0.2052598  0.24423671 0.17268133 0.18593717]

mean value: 0.20485198497772217

key: test_mcc
value: [0.60553007 0.65477023 0.53468154 0.88949918 0.53468154 0.39594419
 0.76623377 0.2548236  0.67005939 0.67005939]

mean value: 0.5976282906983714

key: train_mcc
value: [0.89849587 0.88685769 0.87457979 0.87457979 0.88685769 0.88899836
 0.8872319  0.91188694 0.86279135 0.89953068]

mean value: 0.8871810060846004

key: test_accuracy
value: [0.78947368 0.84210526 0.78947368 0.94736842 0.78947368 0.72222222
 0.88888889 0.66666667 0.83333333 0.83333333]

mean value: 0.810233918128655

key: train_accuracy
value: [0.95180723 0.94578313 0.93975904 0.93975904 0.94578313 0.94610778
 0.94610778 0.95808383 0.93413174 0.95209581]

mean value: 0.9459418512372845

key: test_fscore
value: [0.84615385 0.88       0.84615385 0.96       0.84615385 0.8
 0.90909091 0.76923077 0.88       0.88      ]

mean value: 0.8616783216783217

key: train_fscore
value: [0.96226415 0.95734597 0.95283019 0.95283019 0.95734597 0.95813953
 0.95774648 0.96682464 0.94883721 0.96226415]

mean value: 0.9576428489982294

key: test_precision
value: [0.73333333 0.84615385 0.78571429 0.92307692 0.78571429 0.71428571
 0.90909091 0.66666667 0.78571429 0.78571429]

mean value: 0.7935464535464536

key: train_precision
value: [0.93577982 0.9266055  0.91818182 0.91818182 0.9266055  0.91964286
 0.92727273 0.94444444 0.91071429 0.93577982]

mean value: 0.9263208593139786

key: test_recall
value: [1.         0.91666667 0.91666667 1.         0.91666667 0.90909091
 0.90909091 0.90909091 1.         1.        ]

mean value: 0.9477272727272728

key: train_recall
value: [0.99029126 0.99019608 0.99019608 0.99019608 0.99019608 1.
 0.99029126 0.99029126 0.99029126 0.99029126]

mean value: 0.9912240624405102

key: test_roc_auc
value: [0.75       0.81547619 0.74404762 0.92857143 0.74404762 0.66883117
 0.88311688 0.5974026  0.78571429 0.78571429]

mean value: 0.7702922077922078

key: train_roc_auc
value: [0.93959008 0.93259804 0.92478554 0.92478554 0.93259804 0.9296875
 0.93264563 0.94827063 0.91702063 0.94045813]

mean value: 0.9322439756646995

key: test_jcc
value: [0.73333333 0.78571429 0.73333333 0.92307692 0.73333333 0.66666667
 0.83333333 0.625      0.78571429 0.78571429]

mean value: 0.760521978021978

key: train_jcc
value: [0.92727273 0.91818182 0.90990991 0.90990991 0.91818182 0.91964286
 0.91891892 0.93577982 0.90265487 0.92727273]

mean value: 0.9187725370561085

MCC on Blind test: 0.35

Accuracy on Blind test: 0.64

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01017141 0.01006699 0.01010513 0.01003337 0.01007533 0.0101161
 0.00903153 0.00938582 0.00902843 0.00992584]

mean value: 0.009793996810913086

key: score_time
value: [0.00961804 0.00945258 0.00932074 0.00960994 0.0095284  0.00941348
 0.00897074 0.0086844  0.00946951 0.00944734]

mean value: 0.009351515769958496

key: test_mcc
value: [ 0.23262105  0.23262105 -0.01163105  0.28690229  0.32142857  0.34188173
 -0.02548236 -0.32232919 -0.16883117  0.43320011]

mean value: 0.1320381044112035

key: train_mcc
value: [0.38992541 0.37624725 0.38970588 0.37720787 0.42954422 0.36848818
 0.4353138  0.48789999 0.33479889 0.37453283]

mean value: 0.39636643214511924

key: test_accuracy
value: [0.63157895 0.63157895 0.47368421 0.68421053 0.68421053 0.66666667
 0.5        0.38888889 0.44444444 0.72222222]

mean value: 0.5827485380116959

key: train_accuracy
value: [0.71084337 0.69879518 0.71084337 0.71084337 0.72891566 0.69461078
 0.73053892 0.76047904 0.68263473 0.7005988 ]

mean value: 0.7129103239304524

key: test_fscore
value: [0.69565217 0.69565217 0.5        0.76923077 0.75       0.7
 0.57142857 0.52173913 0.54545455 0.76190476]

mean value: 0.6511062126279518

key: train_fscore
value: [0.76470588 0.74747475 0.76470588 0.77358491 0.77832512 0.74371859
 0.77832512 0.80952381 0.73891626 0.75247525]

mean value: 0.7651755570317448

key: test_precision
value: [0.66666667 0.72727273 0.625      0.71428571 0.75       0.77777778
 0.6        0.5        0.54545455 0.8       ]

mean value: 0.6706457431457431

key: train_precision
value: [0.77227723 0.77083333 0.76470588 0.74545455 0.78217822 0.77083333
 0.79       0.79439252 0.75       0.76767677]

mean value: 0.7708351831059962

key: test_recall
value: [0.72727273 0.66666667 0.41666667 0.83333333 0.75       0.63636364
 0.54545455 0.54545455 0.54545455 0.72727273]

mean value: 0.6393939393939394

key: train_recall
value: [0.75728155 0.7254902  0.76470588 0.80392157 0.7745098  0.7184466
 0.76699029 0.82524272 0.72815534 0.73786408]

mean value: 0.7602608033504664

key: test_roc_auc
value: [0.61363636 0.61904762 0.49404762 0.63095238 0.66071429 0.67532468
 0.48701299 0.34415584 0.41558442 0.72077922]

mean value: 0.5661255411255411

key: train_roc_auc
value: [0.69610109 0.6908701  0.69485294 0.68321078 0.7153799  0.6873483
 0.71943265 0.74074636 0.66876517 0.68924454]

mean value: 0.6985951834212649

key: test_jcc
value: [0.53333333 0.53333333 0.33333333 0.625      0.6        0.53846154
 0.4        0.35294118 0.375      0.61538462]

mean value: 0.4906787330316742

key: train_jcc
value: [0.61904762 0.59677419 0.61904762 0.63076923 0.63709677 0.592
 0.63709677 0.68       0.5859375  0.6031746 ]

mean value: 0.6200944313974556

MCC on Blind test: 0.45

Accuracy on Blind test: 0.72

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.09132409 0.05846572 0.06459308 0.05377865 0.05312014 0.0551908
 0.05610585 0.05561304 0.05613637 0.05491066]

mean value: 0.059923839569091794

key: score_time
value: [0.01047754 0.01103997 0.01097846 0.0104785  0.01049376 0.01075339
 0.01066589 0.01044464 0.01035118 0.01046562]

mean value: 0.010614895820617675

key: test_mcc
value: [0.45361105 0.88949918 0.89559105 1.         0.7824608  0.76623377
 0.89188259 0.39594419 0.88640526 0.43320011]

mean value: 0.7394827993424129

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73684211 0.94736842 0.94736842 1.         0.89473684 0.88888889
 0.94444444 0.72222222 0.94444444 0.72222222]

mean value: 0.8748538011695907

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.7826087  0.96       0.95652174 1.         0.92307692 0.90909091
 0.95238095 0.8        0.95652174 0.76190476]

mean value: 0.900210572036659

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.92307692 1.         1.         0.85714286 0.90909091
 1.         0.71428571 0.91666667 0.8       ]

mean value: 0.887026307026307

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.91666667 1.         1.         0.90909091
 0.90909091 0.90909091 1.         0.72727273]

mean value: 0.918939393939394

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.72159091 0.92857143 0.95833333 1.         0.85714286 0.88311688
 0.95454545 0.66883117 0.92857143 0.72077922]

mean value: 0.8621482683982684

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.64285714 0.92307692 0.91666667 1.         0.85714286 0.83333333
 0.90909091 0.66666667 0.91666667 0.61538462]

mean value: 0.828088578088578

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.53

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.04992199 0.06900644 0.06069613 0.06046605 0.05153871 0.09395623
 0.06998301 0.05825424 0.024194   0.04331779]

mean value: 0.05813345909118652

key: score_time
value: [0.02851057 0.03522396 0.02049589 0.02068567 0.02075195 0.01262808
 0.02250338 0.01193881 0.01193452 0.0223732 ]

mean value: 0.02070460319519043

key: test_mcc
value: [0.45868247 0.36803496 0.43034895 0.42004128 0.77380952 0.88640526
 0.48416483 0.4025974  0.12182898 0.48416483]

mean value: 0.4830078495368003

key: train_mcc
value: [0.96182348 0.94915491 0.92371324 0.96223327 0.93656134 0.89835373
 0.97466626 0.9748321  0.96196428 0.96196428]

mean value: 0.9505266878223336

key: test_accuracy
value: [0.73684211 0.68421053 0.68421053 0.73684211 0.89473684 0.94444444
 0.72222222 0.66666667 0.61111111 0.72222222]

mean value: 0.7403508771929824

key: train_accuracy
value: [0.98192771 0.97590361 0.96385542 0.98192771 0.96987952 0.95209581
 0.98802395 0.98802395 0.98203593 0.98203593]

mean value: 0.976570954476589

key: test_fscore
value: [0.8        0.72727273 0.7        0.8        0.91666667 0.95652174
 0.73684211 0.66666667 0.72       0.73684211]

mean value: 0.7760812010262811

key: train_fscore
value: [0.98536585 0.98058252 0.97058824 0.98550725 0.97584541 0.96153846
 0.99029126 0.99038462 0.98550725 0.98550725]

mean value: 0.9811118102041951

key: test_precision
value: [0.71428571 0.8        0.875      0.76923077 0.91666667 0.91666667
 0.875      0.85714286 0.64285714 0.875     ]

mean value: 0.8241849816849817

key: train_precision
value: [0.99019608 0.97115385 0.97058824 0.97142857 0.96190476 0.95238095
 0.99029126 0.98095238 0.98076923 0.98076923]

mean value: 0.9750434550220387

key: test_recall
value: [0.90909091 0.66666667 0.58333333 0.83333333 0.91666667 1.
 0.63636364 0.54545455 0.81818182 0.63636364]

mean value: 0.7545454545454545

key: train_recall
value: [0.98058252 0.99019608 0.97058824 1.         0.99019608 0.97087379
 0.99029126 1.         0.99029126 0.99029126]

mean value: 0.9873310489244241

key: test_roc_auc
value: [0.70454545 0.69047619 0.7202381  0.70238095 0.88690476 0.92857143
 0.74675325 0.7012987  0.55194805 0.74675325]

mean value: 0.737987012987013

key: train_roc_auc
value: [0.98235475 0.97166054 0.96185662 0.9765625  0.96384804 0.94637439
 0.98733313 0.984375   0.97952063 0.97952063]

mean value: 0.9733406236685613

key: test_jcc
value: [0.66666667 0.57142857 0.53846154 0.66666667 0.84615385 0.91666667
 0.58333333 0.5        0.5625     0.58333333]

mean value: 0.6435210622710623

key: train_jcc
value: [0.97115385 0.96190476 0.94285714 0.97142857 0.95283019 0.92592593
 0.98076923 0.98095238 0.97142857 0.97142857]

mean value: 0.9630679191528249

MCC on Blind test: 0.16

Accuracy on Blind test: 0.58

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.0232811  0.00937867 0.00896811 0.00889492 0.00896215 0.00895619
 0.00900936 0.01012516 0.01084137 0.00878739]

mean value: 0.010720443725585938

key: score_time
value: [0.00920415 0.00897765 0.0090692  0.0088675  0.0088439  0.00873423
 0.00930619 0.00982428 0.00979805 0.00853658]

mean value: 0.009116172790527344

key: test_mcc
value: [ 0.34405118  0.18531233 -0.04941662  0.42004128  0.14085904  0.26856633
  0.2987013   0.16116459  0.0805823   0.26856633]

mean value: 0.2118428048944046

key: train_mcc
value: [0.37947231 0.36682397 0.37021128 0.40845955 0.39898595 0.3183612
 0.34304366 0.3576444  0.42468968 0.34769188]

mean value: 0.37153838914990045

key: test_accuracy
value: [0.68421053 0.63157895 0.52631579 0.73684211 0.63157895 0.66666667
 0.66666667 0.61111111 0.61111111 0.66666667]

mean value: 0.6432748538011696

key: train_accuracy
value: [0.71686747 0.71084337 0.71084337 0.72891566 0.72289157 0.68862275
 0.7005988  0.70658683 0.73652695 0.7005988 ]

mean value: 0.7123295577519659

key: test_fscore
value: [0.76923077 0.72       0.64       0.8        0.74074074 0.75
 0.72727273 0.69565217 0.74074074 0.75      ]

mean value: 0.7333637151898021

key: train_fscore
value: [0.78538813 0.78378378 0.77981651 0.80519481 0.78703704 0.76363636
 0.77477477 0.77828054 0.8018018  0.7706422 ]

mean value: 0.7830355952665203

key: test_precision
value: [0.66666667 0.69230769 0.61538462 0.76923077 0.66666667 0.69230769
 0.72727273 0.66666667 0.625      0.69230769]

mean value: 0.6813811188811189

key: train_precision
value: [0.74137931 0.725      0.73275862 0.72093023 0.74561404 0.71794872
 0.72268908 0.72881356 0.74789916 0.73043478]

mean value: 0.7313467493853907

key: test_recall
value: [0.90909091 0.75       0.66666667 0.83333333 0.83333333 0.81818182
 0.72727273 0.72727273 0.90909091 0.81818182]

mean value: 0.7992424242424243

key: train_recall
value: [0.83495146 0.85294118 0.83333333 0.91176471 0.83333333 0.81553398
 0.83495146 0.83495146 0.86407767 0.81553398]

mean value: 0.8431372549019608

key: test_roc_auc
value: [0.64204545 0.58928571 0.47619048 0.70238095 0.55952381 0.62337662
 0.64935065 0.57792208 0.52597403 0.62337662]

mean value: 0.5969426406926407

key: train_roc_auc
value: [0.67938049 0.66865809 0.67447917 0.67463235 0.69010417 0.64995449
 0.65966323 0.66747573 0.69766383 0.66557949]

mean value: 0.6727591036414566

key: test_jcc
value: [0.625      0.5625     0.47058824 0.66666667 0.58823529 0.6
 0.57142857 0.53333333 0.58823529 0.6       ]

mean value: 0.5805987394957983

key: train_jcc
value: [0.64661654 0.64444444 0.63909774 0.67391304 0.64885496 0.61764706
 0.63235294 0.63703704 0.66917293 0.62686567]

mean value: 0.6436002376478708

MCC on Blind test: 0.43

Accuracy on Blind test: 0.7

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01141    0.01607895 0.01427817 0.01620102 0.01566148 0.01515269
 0.01575112 0.01580167 0.01595831 0.01498699]

mean value: 0.015128040313720703

key: score_time
value: [0.00860381 0.01093388 0.01091313 0.01146984 0.01156068 0.01149845
 0.0115273  0.01151919 0.01151872 0.01149392]

mean value: 0.01110389232635498

key: test_mcc
value: [0.35227273 0.7824608  0.36803496 0.58655573 0.40849122 0.76623377
 0.56061191 0.2548236  0.40291148 0.32232919]

mean value: 0.48047253774078613

key: train_mcc
value: [0.87956612 0.94974006 0.81149011 0.84765971 0.81698712 0.8872319
 0.54476067 0.91320801 0.83195371 0.74686754]

mean value: 0.8229464950111158

key: test_accuracy
value: [0.68421053 0.89473684 0.68421053 0.78947368 0.73684211 0.88888889
 0.77777778 0.66666667 0.72222222 0.61111111]

mean value: 0.7456140350877193

key: train_accuracy
value: [0.93975904 0.97590361 0.90963855 0.92168675 0.90963855 0.94610778
 0.77245509 0.95808383 0.91017964 0.85628743]

mean value: 0.9099740278479186

key: test_fscore
value: [0.72727273 0.92307692 0.72727273 0.81818182 0.81481481 0.90909091
 0.84615385 0.76923077 0.7826087  0.58823529]

mean value: 0.7905938524864355

key: train_fscore
value: [0.94949495 0.98019802 0.93023256 0.93264249 0.93150685 0.95774648
 0.8442623  0.96713615 0.92146597 0.86813187]

mean value: 0.9282817624706369

key: test_precision
value: [0.72727273 0.85714286 0.8        0.9        0.73333333 0.90909091
 0.73333333 0.66666667 0.75       0.83333333]

mean value: 0.791017316017316

key: train_precision
value: [0.98947368 0.99       0.88495575 0.98901099 0.87179487 0.92727273
 0.73049645 0.93636364 1.         1.        ]

mean value: 0.9319368114765849

key: test_recall
value: [0.72727273 1.         0.66666667 0.75       0.91666667 0.90909091
 1.         0.90909091 0.81818182 0.45454545]

mean value: 0.8151515151515152

key: train_recall
value: [0.91262136 0.97058824 0.98039216 0.88235294 1.         0.99029126
 1.         1.         0.85436893 0.76699029]

mean value: 0.9357605177993528

key: test_roc_auc
value: [0.67613636 0.85714286 0.69047619 0.80357143 0.67261905 0.88311688
 0.71428571 0.5974026  0.69480519 0.65584416]

mean value: 0.7245400432900433

key: train_roc_auc
value: [0.94837417 0.97748162 0.88863358 0.93336397 0.8828125  0.93264563
 0.703125   0.9453125  0.92718447 0.88349515]

mean value: 0.9022428581060256

key: test_jcc
value: [0.57142857 0.85714286 0.57142857 0.69230769 0.6875     0.83333333
 0.73333333 0.625      0.64285714 0.41666667]

mean value: 0.6630998168498169

key: train_jcc
value: [0.90384615 0.96116505 0.86956522 0.87378641 0.87179487 0.91891892
 0.73049645 0.93636364 0.85436893 0.76699029]

mean value: 0.8687295931827245

MCC on Blind test: 0.3

Accuracy on Blind test: 0.64

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01471901 0.01381922 0.01489019 0.01392627 0.01413941 0.01386118
 0.01413894 0.01493621 0.01305819 0.01561236]

mean value: 0.014310097694396973

key: score_time
value: [0.01162362 0.01153278 0.01147699 0.01158118 0.01145577 0.01147771
 0.01151061 0.01157641 0.0116725  0.01168513]

mean value: 0.011559271812438964

key: test_mcc
value: [0.29545455 0.65133895 0.53468154 0.3086067  0.65477023 0.66254135
 0.         0.3040345  0.2987013  0.2548236 ]

mean value: 0.39649527059477535

key: train_mcc
value: [0.76345722 0.73678413 0.61692545 0.46724931 0.91088941 0.80279484
 0.28456079 0.54476067 0.64944256 0.95111825]

mean value: 0.6727982631270347

key: test_accuracy
value: [0.63157895 0.78947368 0.78947368 0.68421053 0.84210526 0.83333333
 0.61111111 0.66666667 0.66666667 0.61111111]

mean value: 0.7125730994152046

key: train_accuracy
value: [0.86746988 0.84939759 0.80722892 0.73493976 0.95783133 0.89820359
 0.66467066 0.77245509 0.82035928 0.9760479 ]

mean value: 0.8348603996825626

key: test_fscore
value: [0.63157895 0.8        0.84615385 0.8        0.88       0.85714286
 0.75862069 0.78571429 0.72727273 0.63157895]

mean value: 0.7718062300675731

key: train_fscore
value: [0.88043478 0.8603352  0.86440678 0.82258065 0.96618357 0.9119171
 0.78625954 0.8442623  0.84210526 0.98019802]

mean value: 0.8758683196313127

key: test_precision
value: [0.75       1.         0.78571429 0.66666667 0.84615385 0.9
 0.61111111 0.64705882 0.72727273 0.75      ]

mean value: 0.7683977460448048

key: train_precision
value: [1.         1.         0.76119403 0.69863014 0.95238095 0.97777778
 0.64779874 0.73049645 0.91954023 1.        ]

mean value: 0.8687818322919909

key: test_recall
value: [0.54545455 0.66666667 0.91666667 1.         0.91666667 0.81818182
 1.         1.         0.72727273 0.54545455]

mean value: 0.8136363636363636

key: train_recall
value: [0.78640777 0.75490196 1.         1.         0.98039216 0.85436893
 1.         1.         0.77669903 0.96116505]

mean value: 0.9113934894346087

key: test_roc_auc
value: [0.64772727 0.83333333 0.74404762 0.57142857 0.81547619 0.83766234
 0.5        0.57142857 0.64935065 0.62987013]

mean value: 0.6800324675324675

key: train_roc_auc
value: [0.89320388 0.87745098 0.75       0.65625    0.95113358 0.91155947
 0.5625     0.703125   0.83366201 0.98058252]

mean value: 0.8119467447173044

key: test_jcc
value: [0.46153846 0.66666667 0.73333333 0.66666667 0.78571429 0.75
 0.61111111 0.64705882 0.57142857 0.46153846]

mean value: 0.635505638152697

key: train_jcc
value: [0.78640777 0.75490196 0.76119403 0.69863014 0.93457944 0.83809524
 0.64779874 0.73049645 0.72727273 0.96116505]

mean value: 0.7840541543814717

MCC on Blind test: 0.25

Accuracy on Blind test: 0.62

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.12054205 0.10526896 0.10383749 0.10514784 0.10670519 0.10818172
 0.11293268 0.11231232 0.10736513 0.10779858]

mean value: 0.10900919437408448

key: score_time
value: [0.01530385 0.0149107  0.01513171 0.01525569 0.01502442 0.01519895
 0.0160737  0.01602936 0.0148201  0.01555538]

mean value: 0.0153303861618042

key: test_mcc
value: [0.56729535 0.67460105 1.         0.89559105 0.67460105 0.76623377
 0.89188259 0.52299758 0.77742884 0.53246753]

mean value: 0.7303098813708148

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78947368 0.84210526 1.         0.94736842 0.84210526 0.88888889
 0.94444444 0.77777778 0.88888889 0.77777778]

mean value: 0.8698830409356725

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.86956522 1.         0.95652174 0.86956522 0.90909091
 0.95238095 0.83333333 0.91666667 0.81818182]

mean value: 0.8958639186900056

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.76923077 0.90909091 1.         1.         0.90909091 0.90909091
 1.         0.76923077 0.84615385 0.81818182]

mean value: 0.893006993006993

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.83333333 1.         0.91666667 0.83333333 0.90909091
 0.90909091 0.90909091 1.         0.81818182]

mean value: 0.9037878787878788

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.76704545 0.8452381  1.         0.95833333 0.8452381  0.88311688
 0.95454545 0.74025974 0.85714286 0.76623377]

mean value: 0.861715367965368

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.76923077 1.         0.91666667 0.76923077 0.83333333
 0.90909091 0.71428571 0.84615385 0.69230769]

mean value: 0.8164585414585415

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.16

Accuracy on Blind test: 0.56

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04221368 0.03988671 0.04898667 0.03659725 0.04938769 0.03940272
 0.03929019 0.05242944 0.05490804 0.03600407]

mean value: 0.043910646438598634

key: score_time
value: [0.01941323 0.02854228 0.03419876 0.02406669 0.01860428 0.01948833
 0.02925777 0.02827168 0.01718235 0.01625252]

mean value: 0.02352778911590576

key: test_mcc
value: [0.56729535 1.         0.80507649 0.89559105 1.         0.66254135
 0.79772404 0.56061191 0.88640526 0.53246753]

mean value: 0.7707712971733849

key: train_mcc
value: [1.         0.97457108 0.96182348 0.9873287  0.96204463 1.
 0.98744925 0.97466626 0.98744925 0.94933931]

mean value: 0.9784671941115295

key: test_accuracy
value: [0.78947368 1.         0.89473684 0.94736842 1.         0.83333333
 0.88888889 0.77777778 0.94444444 0.77777778]

mean value: 0.8853801169590643

key: train_accuracy
value: [1.         0.98795181 0.98192771 0.9939759  0.98192771 1.
 0.99401198 0.98802395 0.99401198 0.9760479 ]

mean value: 0.9897878940913354

key: test_fscore
value: [0.83333333 1.         0.90909091 0.95652174 1.         0.85714286
 0.9        0.84615385 0.95652174 0.81818182]

mean value: 0.9076946242163634

key: train_fscore
value: [1.         0.99019608 0.98536585 0.99512195 0.98522167 1.
 0.99512195 0.99029126 0.99512195 0.98076923]

mean value: 0.9917209953530446

key: test_precision
value: [0.76923077 1.         1.         1.         1.         0.9
 1.         0.73333333 0.91666667 0.81818182]

mean value: 0.9137412587412588

key: train_precision
value: [1.         0.99019608 0.98058252 0.99029126 0.99009901 1.
 1.         0.99029126 1.         0.97142857]

mean value: 0.9912888708304624

key: test_recall
value: [0.90909091 1.         0.83333333 0.91666667 1.         0.81818182
 0.81818182 1.         1.         0.81818182]

mean value: 0.9113636363636364

key: train_recall
value: [1.         0.99019608 0.99019608 1.         0.98039216 1.
 0.99029126 0.99029126 0.99029126 0.99029126]

mean value: 0.992194936226918

key: test_roc_auc
value: [0.76704545 1.         0.91666667 0.95833333 1.         0.83766234
 0.90909091 0.71428571 0.92857143 0.76623377]

mean value: 0.8797889610389611

key: train_roc_auc
value: [1.         0.98728554 0.97947304 0.9921875  0.98238358 1.
 0.99514563 0.98733313 0.99514563 0.97170813]

mean value: 0.989066218113459

key: test_jcc
value: [0.71428571 1.         0.83333333 0.91666667 1.         0.75
 0.81818182 0.73333333 0.91666667 0.69230769]

mean value: 0.8374775224775225

key: train_jcc
value: [1.         0.98058252 0.97115385 0.99029126 0.97087379 1.
 0.99029126 0.98076923 0.99029126 0.96226415]

mean value: 0.9836517324953852

MCC on Blind test: 0.14

Accuracy on Blind test: 0.55

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.03705645 0.05874968 0.07011271 0.0530026  0.05404568 0.05509114
 0.02280903 0.02378893 0.02255702 0.03889585]

mean value: 0.04361090660095215

key: score_time
value: [0.02256751 0.02284932 0.02406359 0.02447772 0.02279687 0.02272964
 0.01286435 0.0128026  0.01270914 0.03028536]

mean value: 0.02081460952758789

key: test_mcc
value: [ 0.56729535  0.14085904  0.0952381  -0.03149704 -0.12677314  0.01413507
  0.12182898  0.39594419  0.01413507  0.01413507]

mean value: 0.12053006836854342

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78947368 0.63157895 0.57894737 0.57894737 0.52631579 0.55555556
 0.61111111 0.72222222 0.55555556 0.55555556]

mean value: 0.6105263157894737

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.74074074 0.66666667 0.71428571 0.66666667 0.66666667
 0.72       0.8        0.66666667 0.66666667]

mean value: 0.7141693121693122

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.76923077 0.66666667 0.66666667 0.625      0.6        0.61538462
 0.64285714 0.71428571 0.61538462 0.61538462]

mean value: 0.6530860805860806

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.83333333 0.66666667 0.83333333 0.75       0.72727273
 0.81818182 0.90909091 0.72727273 0.72727273]

mean value: 0.7901515151515152

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.76704545 0.55952381 0.54761905 0.48809524 0.44642857 0.50649351
 0.55194805 0.66883117 0.50649351 0.50649351]

mean value: 0.5548971861471862

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.58823529 0.5        0.55555556 0.5        0.5
 0.5625     0.66666667 0.5        0.5       ]

mean value: 0.5587243230625584

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.56

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.32965541 0.33532381 0.32610035 0.33066726 0.31862354 0.31567144
 0.32750487 0.32567739 0.32546377 0.32319188]

mean value: 0.3257879734039307

key: score_time
value: [0.00963497 0.00933599 0.00911212 0.00913239 0.00947118 0.00982332
 0.01011229 0.0100255  0.01008368 0.01003385]

mean value: 0.009676527976989747

key: test_mcc
value: [0.56818182 0.77380952 0.89559105 1.         0.7824608  0.76623377
 1.         0.39594419 0.88640526 0.64465837]

mean value: 0.7713284776335373

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.78947368 0.89473684 0.94736842 1.         0.89473684 0.88888889
 1.         0.72222222 0.94444444 0.83333333]

mean value: 0.8915204678362573

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.81818182 0.91666667 0.95652174 1.         0.92307692 0.90909091
 1.         0.8        0.95652174 0.86956522]

mean value: 0.9149625012668491

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.81818182 0.91666667 1.         1.         0.85714286 0.90909091
 1.         0.71428571 0.91666667 0.83333333]

mean value: 0.8965367965367965

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.91666667 0.91666667 1.         1.         0.90909091
 1.         0.90909091 1.         0.90909091]

mean value: 0.9378787878787879

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.78409091 0.88690476 0.95833333 1.         0.85714286 0.88311688
 1.         0.66883117 0.92857143 0.81168831]

mean value: 0.8778679653679654

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.69230769 0.84615385 0.91666667 1.         0.85714286 0.83333333
 1.         0.66666667 0.91666667 0.76923077]

mean value: 0.8498168498168498

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.54

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01851869 0.01976252 0.01977897 0.02018905 0.01982856 0.02080202
 0.02046824 0.02414322 0.02400231 0.02459073]

mean value: 0.021208429336547853

key: score_time
value: [0.0122931  0.01221442 0.01403546 0.01435971 0.01452565 0.01226997
 0.01526237 0.01487613 0.01891351 0.02681375]

mean value: 0.015556406974792481

key: test_mcc
value: [-0.20100756  0.18531233 -0.01163105  0.09356015  0.09356015 -0.1934765
  0.2987013   0.3040345  -0.24029619  0.2987013 ]

mean value: 0.0627458419060211

key: train_mcc
value: [1.         1.         0.97474109 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9974741089883715

key: test_accuracy
value: [0.52631579 0.63157895 0.47368421 0.63157895 0.63157895 0.55555556
 0.66666667 0.66666667 0.44444444 0.66666667]

mean value: 0.5894736842105263

key: train_accuracy
value: [1.         1.         0.98795181 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9987951807228915

key: test_fscore
value: [0.68965517 0.72       0.5        0.75862069 0.75862069 0.71428571
 0.72727273 0.78571429 0.58333333 0.72727273]

mean value: 0.6964775339602925

key: train_fscore
value: [1.         1.         0.99029126 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9990291262135922

key: test_precision
value: [0.55555556 0.69230769 0.625      0.64705882 0.64705882 0.58823529
 0.72727273 0.64705882 0.53846154 0.72727273]

mean value: 0.6395282005576124

key: train_precision
value: [1.         1.         0.98076923 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9980769230769231

key: test_recall
value: [0.90909091 0.75       0.41666667 0.91666667 0.91666667 0.90909091
 0.72727273 1.         0.63636364 0.72727273]

mean value: 0.7909090909090909

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.45454545 0.58928571 0.49404762 0.5297619  0.5297619  0.45454545
 0.64935065 0.57142857 0.38961039 0.64935065]

mean value: 0.5311688311688312

key: train_roc_auc
value: [1.       1.       0.984375 1.       1.       1.       1.       1.
 1.       1.      ]

mean value: 0.9984375

key: test_jcc
value: [0.52631579 0.5625     0.33333333 0.61111111 0.61111111 0.55555556
 0.57142857 0.64705882 0.41176471 0.57142857]

mean value: 0.5401607572853703

key: train_jcc
value: [1.         1.         0.98076923 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9980769230769231

MCC on Blind test: 0.27

Accuracy on Blind test: 0.62

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.03033352 0.0397439  0.03561759 0.03916883 0.03715229 0.03341866
 0.03735018 0.05201197 0.05242395 0.05699086]

mean value: 0.04142117500305176

key: score_time
value: [0.02387834 0.0204978  0.02155447 0.02060032 0.02355218 0.02056217
 0.02346349 0.02178597 0.02437901 0.02108812]

mean value: 0.02213618755340576

key: test_mcc
value: [0.10863102 0.42004128 0.67460105 0.77380952 0.77380952 0.76623377
 0.56407607 0.40291148 0.44320263 0.53246753]

mean value: 0.5459783889001629

key: train_mcc
value: [0.92308458 0.93744159 0.89919089 0.91088941 0.89798254 0.91188694
 0.92430455 0.92539974 0.94997541 0.91188694]

mean value: 0.919204259351807

key: test_accuracy
value: [0.57894737 0.73684211 0.84210526 0.89473684 0.89473684 0.88888889
 0.72222222 0.72222222 0.72222222 0.77777778]

mean value: 0.7780701754385965

key: train_accuracy
value: [0.96385542 0.96987952 0.95180723 0.95783133 0.95180723 0.95808383
 0.96407186 0.96407186 0.9760479  0.95808383]

mean value: 0.9615540004328692

key: test_fscore
value: [0.66666667 0.8        0.86956522 0.91666667 0.91666667 0.90909091
 0.70588235 0.7826087  0.81481481 0.81818182]

mean value: 0.8200143808072197

key: train_fscore
value: [0.97115385 0.97607656 0.96190476 0.96618357 0.96116505 0.96682464
 0.97142857 0.97169811 0.98095238 0.96682464]

mean value: 0.9694212141193473

key: test_precision
value: [0.61538462 0.76923077 0.90909091 0.91666667 0.91666667 0.90909091
 1.         0.75       0.6875     0.81818182]

mean value: 0.8291812354312355

key: train_precision
value: [0.96190476 0.95327103 0.93518519 0.95238095 0.95192308 0.94444444
 0.95327103 0.94495413 0.96261682 0.94444444]

mean value: 0.9504395872227905

key: test_recall
value: [0.72727273 0.83333333 0.83333333 0.91666667 0.91666667 0.90909091
 0.54545455 0.81818182 1.         0.81818182]

mean value: 0.8318181818181818

key: train_recall
value: [0.98058252 1.         0.99019608 0.98039216 0.97058824 0.99029126
 0.99029126 1.         1.         0.99029126]

mean value: 0.9892632781267847

key: test_roc_auc
value: [0.55113636 0.70238095 0.8452381  0.88690476 0.88690476 0.88311688
 0.77272727 0.69480519 0.64285714 0.76623377]

mean value: 0.7632305194805196

key: train_roc_auc
value: [0.95854523 0.9609375  0.94041054 0.95113358 0.94623162 0.94827063
 0.95608313 0.953125   0.96875    0.94827063]

mean value: 0.9531757858887892

key: test_jcc
value: [0.5        0.66666667 0.76923077 0.84615385 0.84615385 0.83333333
 0.54545455 0.64285714 0.6875     0.69230769]

mean value: 0.7029657842157843

key: train_jcc
value: [0.94392523 0.95327103 0.9266055  0.93457944 0.92523364 0.93577982
 0.94444444 0.94495413 0.96261682 0.93577982]

mean value: 0.940718987872379

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.37415504 0.30020595 0.36566615 0.33783174 0.32791495 0.40239978
 0.32688546 0.32808876 0.33113956 0.33140039]

mean value: 0.34256877899169924

key: score_time
value: [0.02503872 0.02488184 0.02039814 0.02254295 0.01626158 0.02357626
 0.023417   0.02344203 0.02036166 0.02339745]

mean value: 0.022331762313842773

key: test_mcc
value: [0.10863102 0.54761905 0.67460105 0.77380952 0.77380952 0.76623377
 0.56407607 0.40291148 0.39594419 0.71350607]

mean value: 0.5721141750028133

key: train_mcc
value: [0.92308458 0.92403878 0.89919089 0.91088941 0.89798254 0.91188694
 0.92430455 0.92539974 0.94933931 0.94933931]

mean value: 0.9215456054544839

key: test_accuracy
value: [0.57894737 0.78947368 0.84210526 0.89473684 0.89473684 0.88888889
 0.72222222 0.72222222 0.72222222 0.83333333]

mean value: 0.7888888888888889

key: train_accuracy
value: [0.96385542 0.96385542 0.95180723 0.95783133 0.95180723 0.95808383
 0.96407186 0.96407186 0.9760479  0.9760479 ]

mean value: 0.9627479979799437

key: test_fscore
value: [0.66666667 0.83333333 0.86956522 0.91666667 0.91666667 0.90909091
 0.70588235 0.7826087  0.8        0.84210526]

mean value: 0.8242585771566792

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.97115385 0.97115385 0.96190476 0.96618357 0.96116505 0.96682464
 0.97142857 0.97169811 0.98076923 0.98076923]

mean value: 0.9703050868359714

key: test_precision
value: [0.61538462 0.83333333 0.90909091 0.91666667 0.91666667 0.90909091
 1.         0.75       0.71428571 1.        ]

mean value: 0.8564518814518814

key: train_precision
value: [0.96190476 0.95283019 0.93518519 0.95238095 0.95192308 0.94444444
 0.95327103 0.94495413 0.97142857 0.97142857]

mean value: 0.9539750908852559

key: test_recall
value: [0.72727273 0.83333333 0.83333333 0.91666667 0.91666667 0.90909091
 0.54545455 0.81818182 0.90909091 0.72727273]

mean value: 0.8136363636363636

key: train_recall
value: [0.98058252 0.99019608 0.99019608 0.98039216 0.97058824 0.99029126
 0.99029126 1.         0.99029126 0.99029126]

mean value: 0.9873120121835142

key: test_roc_auc
value: [0.55113636 0.77380952 0.8452381  0.88690476 0.88690476 0.88311688
 0.77272727 0.69480519 0.66883117 0.86363636]

mean value: 0.782711038961039

key: train_roc_auc
value: [0.95854523 0.95603554 0.94041054 0.95113358 0.94623162 0.94827063
 0.95608313 0.953125   0.97170813 0.97170813]

mean value: 0.9553251529171539

key: test_jcc
value: [0.5        0.71428571 0.76923077 0.84615385 0.84615385 0.83333333
 0.54545455 0.64285714 0.66666667 0.72727273]

mean value: 0.7091408591408591

key: train_jcc
value: [0.94392523 0.94392523 0.9266055  0.93457944 0.92523364 0.93577982
 0.94444444 0.94495413 0.96226415 0.96226415]

mean value: 0.942397574727439

MCC on Blind test: 0.14

Accuracy on Blind test: 0.57

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03312874 0.06524324 0.10863423 0.13530827 0.03684163 0.03828144
 0.03505182 0.06022382 0.09920526 0.03362274]

mean value: 0.06455411911010742

key: score_time
value: [0.01303244 0.01533794 0.0123136  0.01206446 0.01194906 0.01522326
 0.01591754 0.01242018 0.01871157 0.01189852]

mean value: 0.013886857032775878

key: test_mcc
value: [0.74047959 0.6992059  0.56818182 0.56490196 0.65151515 0.83971912
 0.74047959 0.66414149 0.45454545 0.37796447]

mean value: 0.6301134542242922

key: train_mcc
value: [0.83418999 0.88292404 0.903143   0.85368872 0.85370265 0.88292404
 0.85368872 0.83418999 0.81557242 0.88366175]

mean value: 0.8597685325684289

key: test_accuracy
value: [0.86956522 0.82608696 0.7826087  0.7826087  0.82608696 0.91304348
 0.86956522 0.82608696 0.72727273 0.68181818]

mean value: 0.8104743083003952

key: train_accuracy
value: [0.91707317 0.94146341 0.95121951 0.92682927 0.92682927 0.94146341
 0.92682927 0.91707317 0.90776699 0.94174757]

mean value: 0.9298295050911675

key: test_fscore
value: [0.85714286 0.84615385 0.7826087  0.76190476 0.83333333 0.90909091
 0.88       0.81818182 0.72727273 0.63157895]

mean value: 0.8047267896100848

key: train_fscore
value: [0.91707317 0.94174757 0.95049505 0.92753623 0.92682927 0.94117647
 0.92610837 0.91707317 0.90731707 0.94230769]

mean value: 0.9297664074411536

key: test_precision
value: [0.9        0.73333333 0.75       0.8        0.83333333 1.
 0.84615385 0.9        0.72727273 0.75      ]

mean value: 0.824009324009324

key: train_precision
value: [0.92156863 0.94174757 0.96969697 0.92307692 0.9223301  0.94117647
 0.93069307 0.91262136 0.91176471 0.93333333]

mean value: 0.9308009128461939

key: test_recall
value: [0.81818182 1.         0.81818182 0.72727273 0.83333333 0.83333333
 0.91666667 0.75       0.72727273 0.54545455]

mean value: 0.796969696969697

key: train_recall
value: [0.91262136 0.94174757 0.93203883 0.93203883 0.93137255 0.94117647
 0.92156863 0.92156863 0.90291262 0.95145631]

mean value: 0.9288501808490387

key: test_roc_auc
value: [0.86742424 0.83333333 0.78409091 0.78030303 0.82575758 0.91666667
 0.86742424 0.82954545 0.72727273 0.68181818]

mean value: 0.8113636363636364

key: train_roc_auc
value: [0.91709499 0.94146202 0.95131354 0.92680373 0.92685132 0.94146202
 0.92680373 0.91709499 0.90776699 0.94174757]

mean value: 0.9298400913763564

key: test_jcc
value: [0.75       0.73333333 0.64285714 0.61538462 0.71428571 0.83333333
 0.78571429 0.69230769 0.57142857 0.46153846]

mean value: 0.680018315018315

key: train_jcc
value: [0.84684685 0.88990826 0.90566038 0.86486486 0.86363636 0.88888889
 0.86238532 0.84684685 0.83035714 0.89090909]

mean value: 0.8690304000190187

MCC on Blind test: 0.33

Accuracy on Blind test: 0.67

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.82665825 0.93447065 0.77925324 0.78353238 0.88928652 0.84824395
 0.78165197 0.86581111 0.82213378 0.89065075]

mean value: 0.8421692609786987

key: score_time
value: [0.01458549 0.01205873 0.01185203 0.01503658 0.0150156  0.01501513
 0.0152204  0.01511884 0.01503301 0.01509953]

mean value: 0.014403533935546876

key: test_mcc
value: [0.65909298 0.63327851 0.56818182 0.65151515 0.74242424 0.91666667
 0.74047959 0.65151515 0.63636364 0.46225016]

mean value: 0.6661767909021908

key: train_mcc
value: [1.         0.99029126 0.95126594 1.         0.95126594 0.92194936
 1.         1.         1.         0.9223301 ]

mean value: 0.9737102608033504

key: test_accuracy
value: [0.82608696 0.7826087  0.7826087  0.82608696 0.86956522 0.95652174
 0.86956522 0.82608696 0.81818182 0.72727273]

mean value: 0.8284584980237154

key: train_accuracy
value: [1.         0.99512195 0.97560976 1.         0.97560976 0.96097561
 1.         1.         1.         0.96116505]

mean value: 0.9868482121714421

key: test_fscore
value: [0.8        0.81481481 0.7826087  0.81818182 0.86956522 0.95652174
 0.88       0.83333333 0.81818182 0.7       ]

mean value: 0.8273207436685698

key: train_fscore
value: [1.         0.99512195 0.97560976 1.         0.97560976 0.96078431
 1.         1.         1.         0.96116505]

mean value: 0.9868290825683814

key: test_precision
value: [0.88888889 0.6875     0.75       0.81818182 0.90909091 1.
 0.84615385 0.83333333 0.81818182 0.77777778]

mean value: 0.8329108391608392

key: train_precision
value: [1.         1.         0.98039216 1.         0.97087379 0.96078431
 1.         1.         1.         0.96116505]

mean value: 0.9873215305539692

key: test_recall
value: [0.72727273 1.         0.81818182 0.81818182 0.83333333 0.91666667
 0.91666667 0.83333333 0.81818182 0.63636364]

mean value: 0.8318181818181818

key: train_recall
value: [1.         0.99029126 0.97087379 1.         0.98039216 0.96078431
 1.         1.         1.         0.96116505]

mean value: 0.9863506567675614

key: test_roc_auc
value: [0.8219697  0.79166667 0.78409091 0.82575758 0.87121212 0.95833333
 0.86742424 0.82575758 0.81818182 0.72727273]

mean value: 0.8291666666666666

key: train_roc_auc
value: [1.         0.99514563 0.97563297 1.         0.97563297 0.96097468
 1.         1.         1.         0.96116505]

mean value: 0.9868551304016753

key: test_jcc
value: [0.66666667 0.6875     0.64285714 0.69230769 0.76923077 0.91666667
 0.78571429 0.71428571 0.69230769 0.53846154]

mean value: 0.7105998168498169

key: train_jcc
value: [1.         0.99029126 0.95238095 1.         0.95238095 0.9245283
 1.         1.         1.         0.92523364]

mean value: 0.9744815113644433

MCC on Blind test: 0.29

Accuracy on Blind test: 0.64

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01280308 0.01112795 0.00904179 0.00915575 0.00870848 0.0087254
 0.008672   0.00886917 0.00864339 0.0087316 ]

mean value: 0.009447860717773437

key: score_time
value: [0.01178265 0.00904846 0.0089066  0.00885105 0.00853443 0.00852323
 0.00856495 0.00855088 0.00860286 0.00863481]

mean value: 0.008999991416931152

key: test_mcc
value: [0.41096386 0.44411739 0.38932432 0.15096491 0.38932432 0.3030303
 0.47727273 0.30240737 0.09245003 0.54772256]

mean value: 0.35075777936506775

key: train_mcc
value: [0.4448612  0.44400007 0.46806514 0.53843728 0.47567594 0.45607916
 0.45709726 0.49637007 0.42964161 0.50892419]

mean value: 0.4719151927110299

key: test_accuracy
value: [0.69565217 0.69565217 0.69565217 0.56521739 0.69565217 0.65217391
 0.73913043 0.65217391 0.54545455 0.77272727]

mean value: 0.6709486166007905

key: train_accuracy
value: [0.70243902 0.72195122 0.73170732 0.75609756 0.73658537 0.72682927
 0.72682927 0.74634146 0.71359223 0.75242718]

mean value: 0.7314799905280606

key: test_fscore
value: [0.72       0.74074074 0.66666667 0.61538462 0.72       0.66666667
 0.75       0.69230769 0.58333333 0.76190476]

mean value: 0.6917004477004477

key: train_fscore
value: [0.75502008 0.72727273 0.75113122 0.78991597 0.74766355 0.73831776
 0.74074074 0.75925926 0.6974359  0.76712329]

mean value: 0.747388048921837

key: test_precision
value: [0.64285714 0.625      0.7        0.53333333 0.69230769 0.66666667
 0.75       0.64285714 0.53846154 0.8       ]

mean value: 0.6591483516483516

key: train_precision
value: [0.64383562 0.71698113 0.70338983 0.6962963  0.71428571 0.70535714
 0.70175439 0.71929825 0.73913043 0.72413793]

mean value: 0.7064466729857495

key: test_recall
value: [0.81818182 0.90909091 0.63636364 0.72727273 0.75       0.66666667
 0.75       0.75       0.63636364 0.72727273]

mean value: 0.7371212121212121

key: train_recall
value: [0.91262136 0.73786408 0.80582524 0.91262136 0.78431373 0.7745098
 0.78431373 0.80392157 0.66019417 0.81553398]

mean value: 0.7991719017704169

key: test_roc_auc
value: [0.70075758 0.70454545 0.69318182 0.5719697  0.69318182 0.65151515
 0.73863636 0.64772727 0.54545455 0.77272727]

mean value: 0.6719696969696969

key: train_roc_auc
value: [0.70140872 0.72187322 0.73134399 0.75533029 0.73681706 0.72706073
 0.72710832 0.74662098 0.71359223 0.75242718]

mean value: 0.7313582714639254

key: test_jcc
value: [0.5625     0.58823529 0.5        0.44444444 0.5625     0.5
 0.6        0.52941176 0.41176471 0.61538462]

mean value: 0.5314240824534943

key: train_jcc
value: [0.60645161 0.57142857 0.60144928 0.65277778 0.59701493 0.58518519
 0.58823529 0.6119403  0.53543307 0.62222222]

mean value: 0.5972138233743687

MCC on Blind test: 0.45

Accuracy on Blind test: 0.71

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00916004 0.00893545 0.0089612  0.00898767 0.00895739 0.00898981
 0.00910544 0.00896788 0.00898862 0.00901413]

mean value: 0.009006762504577636

key: score_time
value: [0.00864172 0.00866985 0.00862956 0.00853229 0.00861835 0.00858736
 0.00872016 0.00858521 0.00866818 0.00860548]

mean value: 0.00862581729888916

key: test_mcc
value: [0.65909298 0.21452908 0.12336594 0.21452908 0.08257228 0.44411739
 0.08257228 0.23262105 0.32539569 0.23570226]

mean value: 0.26144980489209724

key: train_mcc
value: [0.431714   0.44379575 0.47690661 0.38794503 0.41929975 0.43858746
 0.45614118 0.4454215  0.40723148 0.39531893]

mean value: 0.430236169300088

key: test_accuracy
value: [0.82608696 0.60869565 0.56521739 0.60869565 0.52173913 0.69565217
 0.52173913 0.60869565 0.63636364 0.59090909]

mean value: 0.6183794466403162

key: train_accuracy
value: [0.70243902 0.70731707 0.72195122 0.67804878 0.69268293 0.70731707
 0.71707317 0.70731707 0.69417476 0.68446602]

mean value: 0.7012787118162443

key: test_fscore
value: [0.8        0.52631579 0.44444444 0.52631579 0.35294118 0.63157895
 0.35294118 0.57142857 0.5        0.4       ]

mean value: 0.5105965895129981

key: train_fscore
value: [0.64327485 0.64705882 0.66272189 0.60240964 0.61349693 0.64705882
 0.6627907  0.63855422 0.64       0.61538462]

mean value: 0.6372750495347176

key: test_precision
value: [0.88888889 0.625      0.57142857 0.625      0.6        0.85714286
 0.6        0.66666667 0.8        0.75      ]

mean value: 0.6984126984126984

key: train_precision
value: [0.80882353 0.82089552 0.84848485 0.79365079 0.81967213 0.80882353
 0.81428571 0.828125   0.77777778 0.78787879]

mean value: 0.8108417634437052

key: test_recall
value: [0.72727273 0.45454545 0.36363636 0.45454545 0.25       0.5
 0.25       0.5        0.36363636 0.27272727]

mean value: 0.41363636363636364

key: train_recall
value: [0.53398058 0.53398058 0.54368932 0.48543689 0.49019608 0.53921569
 0.55882353 0.51960784 0.54368932 0.50485437]

mean value: 0.5253474205216067

key: test_roc_auc
value: [0.8219697  0.60227273 0.55681818 0.60227273 0.53409091 0.70454545
 0.53409091 0.61363636 0.63636364 0.59090909]

mean value: 0.6196969696969696

key: train_roc_auc
value: [0.7032648  0.70816676 0.72282505 0.67899296 0.69169998 0.70650105
 0.71630497 0.70640586 0.69417476 0.68446602]

mean value: 0.7012802208261946

key: test_jcc
value: [0.66666667 0.35714286 0.28571429 0.35714286 0.21428571 0.46153846
 0.21428571 0.4        0.33333333 0.25      ]

mean value: 0.354010989010989

key: train_jcc
value: [0.47413793 0.47826087 0.49557522 0.43103448 0.44247788 0.47826087
 0.49565217 0.46902655 0.47058824 0.44444444]

mean value: 0.4679458652592843

MCC on Blind test: 0.19

Accuracy on Blind test: 0.59

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00874019 0.00965261 0.0093236  0.00860143 0.00863314 0.00863075
 0.00869918 0.00897694 0.00959516 0.009655  ]

mean value: 0.009050798416137696

key: score_time
value: [0.01461792 0.01053357 0.01011968 0.00984621 0.00996614 0.00994396
 0.01022553 0.01024222 0.01070547 0.01071763]

mean value: 0.01069183349609375

key: test_mcc
value: [0.12878788 0.31298622 0.02585438 0.12406456 0.44411739 0.2096648
 0.25495628 0.3030303  0.23570226 0.        ]

mean value: 0.20391640867334052

key: train_mcc
value: [0.53446628 0.51172946 0.55610418 0.52267493 0.48193786 0.50002007
 0.46832513 0.45886299 0.51700551 0.53764186]

mean value: 0.5088768274038467

key: test_accuracy
value: [0.56521739 0.65217391 0.52173913 0.56521739 0.69565217 0.56521739
 0.60869565 0.65217391 0.59090909 0.5       ]

mean value: 0.591699604743083

key: train_accuracy
value: [0.76585366 0.75121951 0.77073171 0.75609756 0.73658537 0.74634146
 0.72682927 0.72682927 0.75728155 0.76699029]

mean value: 0.7504759649538243

key: test_fscore
value: [0.54545455 0.66666667 0.35294118 0.5        0.63157895 0.375
 0.52631579 0.66666667 0.4        0.42105263]

mean value: 0.5085676423679519

key: train_fscore
value: [0.75510204 0.72727273 0.7431694  0.7311828  0.70652174 0.72043011
 0.68539326 0.70212766 0.74489796 0.75257732]

mean value: 0.7268675006125136

key: test_precision
value: [0.54545455 0.61538462 0.5        0.55555556 0.85714286 0.75
 0.71428571 0.66666667 0.75       0.5       ]

mean value: 0.6454489954489955

key: train_precision
value: [0.79569892 0.80952381 0.85       0.81927711 0.79268293 0.79761905
 0.80263158 0.76744186 0.78494624 0.8021978 ]

mean value: 0.802201929530647

key: test_recall
value: [0.54545455 0.72727273 0.27272727 0.45454545 0.5        0.25
 0.41666667 0.66666667 0.27272727 0.36363636]

mean value: 0.44696969696969696

key: train_recall
value: [0.7184466  0.66019417 0.66019417 0.66019417 0.6372549  0.65686275
 0.59803922 0.64705882 0.70873786 0.70873786]

mean value: 0.6655720540643442

key: test_roc_auc
value: [0.56439394 0.65530303 0.51136364 0.56060606 0.70454545 0.57954545
 0.61742424 0.65151515 0.59090909 0.5       ]

mean value: 0.593560606060606

key: train_roc_auc
value: [0.76608605 0.75166571 0.77127356 0.75656768 0.73610318 0.7459071
 0.72620407 0.72644203 0.75728155 0.76699029]

mean value: 0.7504521225966115

key: test_jcc
value: [0.375      0.5        0.21428571 0.33333333 0.46153846 0.23076923
 0.35714286 0.5        0.25       0.26666667]

mean value: 0.3488736263736264

key: train_jcc
value: [0.60655738 0.57142857 0.59130435 0.57627119 0.54621849 0.56302521
 0.52136752 0.54098361 0.59349593 0.60330579]

mean value: 0.5713958028231724

MCC on Blind test: 0.13

Accuracy on Blind test: 0.56

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0141499  0.01209903 0.01171756 0.01287889 0.01204228 0.01184416
 0.01186538 0.01188636 0.01186728 0.01376891]

mean value: 0.012411975860595703

key: score_time
value: [0.01020265 0.00940251 0.00930929 0.01236463 0.00963902 0.00941229
 0.00974226 0.00945449 0.00942039 0.01031494]

mean value: 0.009926247596740722

key: test_mcc
value: [0.58002308 0.6992059  0.38932432 0.21374669 0.5164589  0.58930667
 0.58930667 0.39393939 0.2773501  0.2773501 ]

mean value: 0.4526011806094105

key: train_mcc
value: [0.68838106 0.71237056 0.72307355 0.72506339 0.70305132 0.70305132
 0.66368352 0.72693519 0.72423827 0.74069712]

mean value: 0.7110545294044077

key: test_accuracy
value: [0.7826087  0.82608696 0.69565217 0.60869565 0.73913043 0.7826087
 0.7826087  0.69565217 0.63636364 0.63636364]

mean value: 0.7185770750988142

key: train_accuracy
value: [0.84390244 0.85365854 0.85853659 0.85853659 0.84878049 0.84878049
 0.82926829 0.86341463 0.8592233  0.86893204]

mean value: 0.8533033388586313

key: test_fscore
value: [0.73684211 0.84615385 0.66666667 0.57142857 0.7        0.76190476
 0.76190476 0.69565217 0.6        0.6       ]

mean value: 0.6940552887234809

key: train_fscore
value: [0.84158416 0.84536082 0.84974093 0.84816754 0.83769634 0.83769634
 0.81675393 0.86138614 0.84974093 0.86294416]

mean value: 0.8451071285619147

key: test_precision
value: [0.875      0.73333333 0.7        0.6        0.875      0.88888889
 0.88888889 0.72727273 0.66666667 0.66666667]

mean value: 0.7621717171717172

key: train_precision
value: [0.85858586 0.9010989  0.91111111 0.92045455 0.8988764  0.8988764
 0.87640449 0.87       0.91111111 0.90425532]

mean value: 0.895077414988125

key: test_recall
value: [0.63636364 1.         0.63636364 0.54545455 0.58333333 0.66666667
 0.66666667 0.66666667 0.54545455 0.54545455]

mean value: 0.6492424242424242

key: train_recall
value: [0.82524272 0.7961165  0.7961165  0.78640777 0.78431373 0.78431373
 0.76470588 0.85294118 0.7961165  0.82524272]

mean value: 0.8011517228250523

key: test_roc_auc
value: [0.77651515 0.83333333 0.69318182 0.60606061 0.74621212 0.78787879
 0.78787879 0.6969697  0.63636364 0.63636364]

mean value: 0.7200757575757576

key: train_roc_auc
value: [0.84399391 0.85394061 0.85884257 0.85889016 0.84846754 0.84846754
 0.82895488 0.86336379 0.8592233  0.86893204]

mean value: 0.8533076337331049

key: test_jcc
value: [0.58333333 0.73333333 0.5        0.4        0.53846154 0.61538462
 0.61538462 0.53333333 0.42857143 0.42857143]

mean value: 0.5376373626373626

key: train_jcc
value: [0.72649573 0.73214286 0.73873874 0.73636364 0.72072072 0.72072072
 0.69026549 0.75652174 0.73873874 0.75892857]

mean value: 0.7319636936205809

MCC on Blind test: 0.33

Accuracy on Blind test: 0.67

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.5654161  0.70051932 0.43851113 0.8557241  0.81149411 0.945997
 0.29760575 0.19453859 0.41354036 0.27692008]

mean value: 0.5500266551971436

key: score_time
value: [0.01253867 0.01216602 0.01216125 0.01217246 0.01259136 0.01223016
 0.01214933 0.01222849 0.01218319 0.02718925]

mean value: 0.013761019706726075

key: test_mcc
value: [0.65151515 0.63327851 0.38932432 0.39727608 0.56818182 0.74242424
 0.63327851 0.12406456 0.47140452 0.23570226]

mean value: 0.48464499668963007

key: train_mcc
value: [0.64278523 0.8360404  0.75277897 0.79983884 0.89371934 0.87321531
 0.54046344 0.58230118 0.58157543 0.48196269]

mean value: 0.6984680827686014

key: test_accuracy
value: [0.82608696 0.7826087  0.69565217 0.69565217 0.7826087  0.86956522
 0.7826087  0.56521739 0.68181818 0.59090909]

mean value: 0.7272727272727273

key: train_accuracy
value: [0.8        0.91707317 0.86829268 0.89756098 0.94634146 0.93658537
 0.76097561 0.78536585 0.75728155 0.69417476]

mean value: 0.8363651432630831

key: test_fscore
value: [0.81818182 0.81481481 0.66666667 0.63157895 0.7826087  0.86956522
 0.73684211 0.61538462 0.75862069 0.68965517]

mean value: 0.7383918742791937

key: train_fscore
value: [0.83127572 0.92018779 0.85405405 0.89230769 0.94472362 0.93658537
 0.72316384 0.80357143 0.80314961 0.76404494]

mean value: 0.8473064064396472

key: test_precision
value: [0.81818182 0.6875     0.7        0.75       0.81818182 0.90909091
 1.         0.57142857 0.61111111 0.55555556]

mean value: 0.7421049783549784

key: train_precision
value: [0.72142857 0.89090909 0.96341463 0.94565217 0.96907216 0.93203883
 0.85333333 0.73770492 0.67549669 0.62195122]

mean value: 0.8311001629916994

key: test_recall
value: [0.81818182 1.         0.63636364 0.54545455 0.75       0.83333333
 0.58333333 0.66666667 1.         0.90909091]

mean value: 0.7742424242424243

key: train_recall
value: [0.98058252 0.95145631 0.76699029 0.84466019 0.92156863 0.94117647
 0.62745098 0.88235294 0.99029126 0.99029126]

mean value: 0.8896820864268037

key: test_roc_auc
value: [0.82575758 0.79166667 0.69318182 0.68939394 0.78409091 0.87121212
 0.79166667 0.56060606 0.68181818 0.59090909]

mean value: 0.728030303030303

key: train_roc_auc
value: [0.79911479 0.91690463 0.86878926 0.89782029 0.94622121 0.93660765
 0.76032743 0.78583666 0.75728155 0.69417476]

mean value: 0.836307824100514

key: test_jcc
value: [0.69230769 0.6875     0.5        0.46153846 0.64285714 0.76923077
 0.58333333 0.44444444 0.61111111 0.52631579]

mean value: 0.5918638744296639

key: train_jcc
value: [0.71126761 0.85217391 0.74528302 0.80555556 0.8952381  0.88073394
 0.56637168 0.67164179 0.67105263 0.61818182]

mean value: 0.7417500055514455

MCC on Blind test: 0.26

Accuracy on Blind test: 0.63

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01879621 0.01864338 0.01436448 0.01555657 0.01544523 0.01569748
 0.01556516 0.0156126  0.01505065 0.01566124]

mean value: 0.016039299964904784

key: score_time
value: [0.01181293 0.00982785 0.00936031 0.00934577 0.00941539 0.00954008
 0.00944448 0.00944018 0.00936866 0.00942349]

mean value: 0.009697914123535156

key: test_mcc
value: [0.76277007 0.41096386 0.48856385 1.         0.76764947 0.83971912
 0.83743579 0.91605722 0.91287093 0.91287093]

mean value: 0.7848901253107335

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.69565217 0.73913043 1.         0.86956522 0.91304348
 0.91304348 0.95652174 0.95454545 0.95454545]

mean value: 0.8865612648221344

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.84210526 0.72       0.75       1.         0.85714286 0.90909091
 0.92307692 0.96       0.95652174 0.95238095]

mean value: 0.8870318643979971

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.64285714 0.69230769 1.         1.         1.
 0.85714286 0.92307692 0.91666667 1.        ]

mean value: 0.9032051282051282

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.81818182 0.81818182 1.         0.75       0.83333333
 1.         1.         1.         0.90909091]

mean value: 0.8856060606060606

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86363636 0.70075758 0.74242424 1.         0.875      0.91666667
 0.90909091 0.95454545 0.95454545 0.95454545]

mean value: 0.8871212121212121

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.72727273 0.5625     0.6        1.         0.75       0.83333333
 0.85714286 0.92307692 0.91666667 0.90909091]

mean value: 0.8079083416583417

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.14

Accuracy on Blind test: 0.56

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10700893 0.10751104 0.10628223 0.1066637  0.10690093 0.10731983
 0.1071713  0.10743928 0.1068995  0.10708857]

mean value: 0.10702853202819824

key: score_time
value: [0.01878786 0.01899338 0.01907802 0.0190022  0.01901197 0.0189383
 0.01898313 0.01904893 0.0191102  0.019032  ]

mean value: 0.018998599052429198

key: test_mcc
value: [0.66414149 0.6992059  0.48856385 0.39727608 0.41096386 0.65151515
 0.91605722 0.58002308 0.46225016 0.54772256]

mean value: 0.5817719350962288

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.82608696 0.82608696 0.73913043 0.69565217 0.69565217 0.82608696
 0.95652174 0.7826087  0.72727273 0.77272727]

mean value: 0.7847826086956522

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.83333333 0.84615385 0.75       0.63157895 0.66666667 0.83333333
 0.96       0.81481481 0.7        0.76190476]

mean value: 0.7797785703575177

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.76923077 0.73333333 0.69230769 0.75       0.77777778 0.83333333
 0.92307692 0.73333333 0.77777778 0.8       ]

mean value: 0.7790170940170941

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 1.         0.81818182 0.54545455 0.58333333 0.83333333
 1.         0.91666667 0.63636364 0.72727273]

mean value: 0.796969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.82954545 0.83333333 0.74242424 0.68939394 0.70075758 0.82575758
 0.95454545 0.77651515 0.72727273 0.77272727]

mean value: 0.7852272727272727

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.71428571 0.73333333 0.6        0.46153846 0.5        0.71428571
 0.92307692 0.6875     0.53846154 0.61538462]

mean value: 0.6487866300366301

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.33

Accuracy on Blind test: 0.66

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01075053 0.01030135 0.00908804 0.00905085 0.00904369 0.01010799
 0.0100894  0.0101018  0.00930357 0.00907087]

mean value: 0.009690809249877929

key: score_time
value: [0.01013565 0.00923514 0.00874352 0.00862527 0.00860524 0.00943875
 0.00945067 0.00937891 0.00866175 0.00867438]

mean value: 0.009094929695129395

key: test_mcc
value: [ 0.47727273  0.48856385 -0.04545455  0.48075018  0.44411739 -0.03816905
  0.13740858  0.21374669 -0.09759001 -0.18257419]

mean value: 0.18780716298837632

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73913043 0.73913043 0.47826087 0.73913043 0.69565217 0.47826087
 0.56521739 0.60869565 0.45454545 0.40909091]

mean value: 0.5907114624505929

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.72727273 0.75       0.45454545 0.7        0.63157895 0.45454545
 0.54545455 0.64       0.33333333 0.43478261]

mean value: 0.5671513071215588

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.72727273 0.69230769 0.45454545 0.77777778 0.85714286 0.5
 0.6        0.61538462 0.42857143 0.41666667]

mean value: 0.6069669219669219

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.81818182 0.45454545 0.63636364 0.5        0.41666667
 0.5        0.66666667 0.27272727 0.45454545]

mean value: 0.5446969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73863636 0.74242424 0.47727273 0.73484848 0.70454545 0.48106061
 0.56818182 0.60606061 0.45454545 0.40909091]

mean value: 0.5916666666666667

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.57142857 0.6        0.29411765 0.53846154 0.46153846 0.29411765
 0.375      0.47058824 0.2        0.27777778]

mean value: 0.4083029878618114

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.15

Accuracy on Blind test: 0.57

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.32547903 1.41488957 1.40615559 1.33278108 1.30856895 1.30703354
 1.31076121 1.31396508 1.30783343 1.31076026]

mean value: 1.333822774887085

key: score_time
value: [0.15590978 0.09691024 0.09611034 0.08798575 0.09535575 0.09329295
 0.08852673 0.09049702 0.09529018 0.08904791]

mean value: 0.0988926649093628

key: test_mcc
value: [0.74047959 0.63327851 0.39393939 0.65151515 0.74242424 0.83971912
 0.82575758 0.65909298 0.63636364 0.73029674]

mean value: 0.6852866944934071

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.7826087  0.69565217 0.82608696 0.86956522 0.91304348
 0.91304348 0.82608696 0.81818182 0.86363636]

mean value: 0.8377470355731225

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.81481481 0.69565217 0.81818182 0.86956522 0.90909091
 0.91666667 0.84615385 0.81818182 0.85714286]

mean value: 0.8402592978679935

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.6875     0.66666667 0.81818182 0.90909091 1.
 0.91666667 0.78571429 0.81818182 0.9       ]

mean value: 0.8402002164502165

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.72727273 0.81818182 0.83333333 0.83333333
 0.91666667 0.91666667 0.81818182 0.81818182]

mean value: 0.85

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 0.79166667 0.6969697  0.82575758 0.87121212 0.91666667
 0.91287879 0.8219697  0.81818182 0.86363636]

mean value: 0.8386363636363636

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.6875     0.53333333 0.69230769 0.76923077 0.83333333
 0.84615385 0.73333333 0.69230769 0.75      ]

mean value: 0.72875

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.25

Accuracy on Blind test: 0.61

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.8421793  0.98309755 0.89729023 0.8816216  0.92900515 0.9211638
 0.88418436 0.92679811 0.93264508 0.93030334]

mean value: 0.9128288507461548

key: score_time
value: [0.2220726  0.20171928 0.21628141 0.22029281 0.17704153 0.23349524
 0.24436331 0.23651242 0.19798326 0.22944474]

mean value: 0.21792066097259521

key: test_mcc
value: [0.65151515 0.63327851 0.48856385 0.65151515 0.56490196 0.83971912
 0.74047959 0.74047959 0.63636364 0.73029674]

mean value: 0.6677113300585276

key: train_mcc
value: [0.97077583 0.97077583 0.98067223 0.9516192  0.96116136 0.96116136
 0.95163291 0.9707786  0.95186015 0.97091955]

mean value: 0.9641356995103791

key: test_accuracy
value: [0.82608696 0.7826087  0.73913043 0.82608696 0.7826087  0.91304348
 0.86956522 0.86956522 0.81818182 0.86363636]

mean value: 0.8290513833992095

key: train_accuracy
value: [0.98536585 0.98536585 0.9902439  0.97560976 0.9804878  0.9804878
 0.97560976 0.98536585 0.97572816 0.98543689]

mean value: 0.9819701633909543

key: test_fscore
value: [0.81818182 0.81481481 0.75       0.81818182 0.8        0.90909091
 0.88       0.88       0.81818182 0.85714286]

mean value: 0.8345594035594036

key: train_fscore
value: [0.98550725 0.98550725 0.99038462 0.97607656 0.98058252 0.98058252
 0.97584541 0.98536585 0.97607656 0.98550725]

mean value: 0.9821435777393142

key: test_precision
value: [0.81818182 0.6875     0.69230769 0.81818182 0.76923077 1.
 0.84615385 0.84615385 0.81818182 0.9       ]

mean value: 0.8195891608391609

key: train_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[0.98076923 0.98076923 0.98095238 0.96226415 0.97115385 0.97115385
 0.96190476 0.98058252 0.96226415 0.98076923]

mean value: 0.9732583353631165

key: test_recall
value: [0.81818182 1.         0.81818182 0.81818182 0.83333333 0.83333333
 0.91666667 0.91666667 0.81818182 0.81818182]

mean value: 0.8590909090909091

key: train_recall
value: [0.99029126 0.99029126 1.         0.99029126 0.99019608 0.99019608
 0.99019608 0.99019608 0.99029126 0.99029126]

mean value: 0.9912240624405102

key: test_roc_auc
value: [0.82575758 0.79166667 0.74242424 0.82575758 0.78030303 0.91666667
 0.86742424 0.86742424 0.81818182 0.86363636]

mean value: 0.8299242424242425

key: train_roc_auc
value: [0.98534171 0.98534171 0.99019608 0.97553779 0.98053493 0.98053493
 0.97568056 0.9853893  0.97572816 0.98543689]

mean value: 0.9819722063582714

key: test_jcc
value: [0.69230769 0.6875     0.6        0.69230769 0.66666667 0.83333333
 0.78571429 0.78571429 0.69230769 0.75      ]

mean value: 0.7185851648351649

key: train_jcc
value: [0.97142857 0.97142857 0.98095238 0.95327103 0.96190476 0.96190476
 0.95283019 0.97115385 0.95327103 0.97142857]

mean value: 0.9649573709955477

MCC on Blind test: 0.28

Accuracy on Blind test: 0.62

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02420473 0.01008201 0.01005197 0.01018906 0.01016712 0.01026821
 0.01024914 0.01026487 0.01008368 0.0101018 ]

mean value: 0.011566257476806641

key: score_time
value: [0.01045871 0.00948524 0.00967598 0.00949144 0.00947976 0.0094893
 0.00947714 0.00952911 0.00955606 0.00959349]

mean value: 0.00962362289428711

key: test_mcc
value: [0.65909298 0.21452908 0.12336594 0.21452908 0.08257228 0.44411739
 0.08257228 0.23262105 0.32539569 0.23570226]

mean value: 0.26144980489209724

key: train_mcc
value: [0.431714   0.44379575 0.47690661 0.38794503 0.41929975 0.43858746
 0.45614118 0.4454215  0.40723148 0.39531893]

mean value: 0.430236169300088

key: test_accuracy
value: [0.82608696 0.60869565 0.56521739 0.60869565 0.52173913 0.69565217
 0.52173913 0.60869565 0.63636364 0.59090909]

mean value: 0.6183794466403162

key: train_accuracy
value: [0.70243902 0.70731707 0.72195122 0.67804878 0.69268293 0.70731707
 0.71707317 0.70731707 0.69417476 0.68446602]

mean value: 0.7012787118162443

key: test_fscore
value: [0.8        0.52631579 0.44444444 0.52631579 0.35294118 0.63157895
 0.35294118 0.57142857 0.5        0.4       ]

mean value: 0.5105965895129981

key: train_fscore
value: [0.64327485 0.64705882 0.66272189 0.60240964 0.61349693 0.64705882
 0.6627907  0.63855422 0.64       0.61538462]

mean value: 0.6372750495347176

key: test_precision
value: [0.88888889 0.625      0.57142857 0.625      0.6        0.85714286
 0.6        0.66666667 0.8        0.75      ]

mean value: 0.6984126984126984

key: train_precision
value: [0.80882353 0.82089552 0.84848485 0.79365079 0.81967213 0.80882353
 0.81428571 0.828125   0.77777778 0.78787879]

mean value: 0.8108417634437052

key: test_recall
value: [0.72727273 0.45454545 0.36363636 0.45454545 0.25       0.5
 0.25       0.5        0.36363636 0.27272727]

mean value: 0.41363636363636364

key: train_recall
value: [0.53398058 0.53398058 0.54368932 0.48543689 0.49019608 0.53921569
 0.55882353 0.51960784 0.54368932 0.50485437]

mean value: 0.5253474205216067

key: test_roc_auc
value: [0.8219697  0.60227273 0.55681818 0.60227273 0.53409091 0.70454545
 0.53409091 0.61363636 0.63636364 0.59090909]

mean value: 0.6196969696969696

key: train_roc_auc
value: [0.7032648  0.70816676 0.72282505 0.67899296 0.69169998 0.70650105
 0.71630497 0.70640586 0.69417476 0.68446602]

mean value: 0.7012802208261946

key: test_jcc
value: [0.66666667 0.35714286 0.28571429 0.35714286 0.21428571 0.46153846
 0.21428571 0.4        0.33333333 0.25      ]

mean value: 0.354010989010989

key: train_jcc
value: [0.47413793 0.47826087 0.49557522 0.43103448 0.44247788 0.47826087
 0.49565217 0.46902655 0.47058824 0.44444444]

mean value: 0.4679458652592843

MCC on Blind test: 0.19

Accuracy on Blind test: 0.59

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.2580533  0.05313063 0.05868006 0.05744362 0.05369687 0.06253695
 0.06104183 0.06061697 0.06067872 0.07018757]

mean value: 0.07960665225982666

key: score_time
value: [0.01125717 0.01169109 0.01044297 0.01053381 0.01151872 0.0110507
 0.01123476 0.01140285 0.01065278 0.01143217]

mean value: 0.011121702194213868

key: test_mcc
value: [0.58002308 0.58930667 0.66414149 1.         0.74242424 0.83971912
 0.83743579 1.         1.         1.        ]

mean value: 0.8253050384253398

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.7826087  0.7826087  0.82608696 1.         0.86956522 0.91304348
 0.91304348 1.         1.         1.        ]

mean value: 0.908695652173913

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.73684211 0.8        0.83333333 1.         0.86956522 0.90909091
 0.92307692 1.         1.         1.        ]

mean value: 0.9071908488155628

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.71428571 0.76923077 1.         0.90909091 1.
 0.85714286 1.         1.         1.        ]

mean value: 0.912475024975025

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 0.90909091 0.90909091 1.         0.83333333 0.83333333
 1.         1.         1.         1.        ]

mean value: 0.9121212121212121

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.77651515 0.78787879 0.82954545 1.         0.87121212 0.91666667
 0.90909091 1.         1.         1.        ]

mean value: 0.9090909090909091

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.58333333 0.66666667 0.71428571 1.         0.76923077 0.83333333
 0.85714286 1.         1.         1.        ]

mean value: 0.8423992673992674

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.52

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.02735448 0.02864647 0.0365417  0.02909827 0.02844095 0.02818394
 0.05325174 0.04603481 0.02615213 0.02718377]

mean value: 0.03308882713317871

key: score_time
value: [0.01265574 0.0118897  0.01186728 0.01191807 0.01186371 0.01190019
 0.02155924 0.01197457 0.01198792 0.0118432 ]

mean value: 0.012945961952209473

key: test_mcc
value: [0.48075018 0.65909298 0.76764947 0.56490196 0.58930667 0.65909298
 0.65151515 0.58930667 0.81818182 0.68313005]

mean value: 0.6462927922925813

key: train_mcc
value: [0.93174679 0.96116136 0.95126131 0.94146202 0.9707786  0.96116136
 0.96116136 0.96097468 0.93208276 0.94192516]

mean value: 0.9513715399106522

key: test_accuracy
value: [0.73913043 0.82608696 0.86956522 0.7826087  0.7826087  0.82608696
 0.82608696 0.7826087  0.90909091 0.81818182]

mean value: 0.8162055335968379

key: train_accuracy
value: [0.96585366 0.9804878  0.97560976 0.97073171 0.98536585 0.9804878
 0.9804878  0.9804878  0.96601942 0.97087379]

mean value: 0.9756405399005447

key: test_fscore
value: [0.7        0.8        0.88       0.76190476 0.76190476 0.84615385
 0.83333333 0.76190476 0.90909091 0.77777778]

mean value: 0.8032070152070152

key: train_fscore
value: [0.96618357 0.98039216 0.97584541 0.97087379 0.98536585 0.98058252
 0.98058252 0.98039216 0.96618357 0.97115385]

mean value: 0.9757555408875802

key: test_precision
value: [0.77777778 0.88888889 0.78571429 0.8        0.88888889 0.78571429
 0.83333333 0.88888889 0.90909091 1.        ]

mean value: 0.8558297258297258

key: train_precision
value: [0.96153846 0.99009901 0.97115385 0.97087379 0.98058252 0.97115385
 0.97115385 0.98039216 0.96153846 0.96190476]

mean value: 0.972039070088657

key: test_recall
value: [0.63636364 0.72727273 1.         0.72727273 0.66666667 0.91666667
 0.83333333 0.66666667 0.90909091 0.63636364]

mean value: 0.771969696969697

key: train_recall
value: [0.97087379 0.97087379 0.98058252 0.97087379 0.99019608 0.99019608
 0.99019608 0.98039216 0.97087379 0.98058252]

mean value: 0.979564058633162

key: test_roc_auc
value: [0.73484848 0.8219697  0.875      0.78030303 0.78787879 0.8219697
 0.82575758 0.78787879 0.90909091 0.81818182]

mean value: 0.8162878787878787

key: train_roc_auc
value: [0.96582905 0.98053493 0.97558538 0.97073101 0.9853893  0.98053493
 0.98053493 0.98048734 0.96601942 0.97087379]

mean value: 0.9756520083761661

key: test_jcc
value: [0.53846154 0.66666667 0.78571429 0.61538462 0.61538462 0.73333333
 0.71428571 0.61538462 0.83333333 0.63636364]

mean value: 0.6754312354312354

key: train_jcc
value: [0.93457944 0.96153846 0.95283019 0.94339623 0.97115385 0.96190476
 0.96190476 0.96153846 0.93457944 0.94392523]

mean value: 0.9527350820284165

MCC on Blind test: 0.18

Accuracy on Blind test: 0.59

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02102685 0.0097661  0.00906491 0.00905061 0.008991   0.00937319
 0.0090394  0.00941396 0.00884175 0.00902915]

mean value: 0.010359692573547363

key: score_time
value: [0.00948429 0.00991464 0.00870991 0.00855541 0.0086112  0.00870061
 0.00849533 0.00886989 0.00867701 0.00849438]

mean value: 0.008851265907287598

key: test_mcc
value: [0.38932432 0.23262105 0.3030303  0.12878788 0.39393939 0.21374669
 0.5164589  0.21969697 0.09245003 0.37796447]

mean value: 0.2868020012621178

key: train_mcc
value: [0.35608875 0.3660859  0.37560698 0.42436935 0.3755949  0.35621133
 0.3658258  0.41462022 0.43763636 0.40824829]

mean value: 0.3880287875437212

key: test_accuracy
value: [0.69565217 0.60869565 0.65217391 0.56521739 0.69565217 0.60869565
 0.73913043 0.60869565 0.54545455 0.68181818]

mean value: 0.6401185770750988

key: train_accuracy
value: [0.67804878 0.68292683 0.68780488 0.71219512 0.68780488 0.67804878
 0.68292683 0.70731707 0.7184466  0.7038835 ]

mean value: 0.6939403267819086

key: test_fscore
value: [0.66666667 0.64       0.63636364 0.54545455 0.69565217 0.64
 0.7        0.60869565 0.58333333 0.63157895]

mean value: 0.634774495527356

key: train_fscore
value: [0.68269231 0.67980296 0.69230769 0.71497585 0.68627451 0.67961165
 0.67980296 0.70588235 0.72641509 0.71090047]

mean value: 0.6958665838244484

key: test_precision
value: [0.7        0.57142857 0.63636364 0.54545455 0.72727273 0.61538462
 0.875      0.63636364 0.53846154 0.75      ]

mean value: 0.659572927072927

key: train_precision
value: [0.67619048 0.69       0.68571429 0.71153846 0.68627451 0.67307692
 0.68316832 0.70588235 0.70642202 0.69444444]

mean value: 0.6912711788889996

key: test_recall
value: [0.63636364 0.72727273 0.63636364 0.54545455 0.66666667 0.66666667
 0.58333333 0.58333333 0.63636364 0.54545455]

mean value: 0.6227272727272727

key: train_recall
value: [0.68932039 0.66990291 0.69902913 0.7184466  0.68627451 0.68627451
 0.67647059 0.70588235 0.74757282 0.72815534]

mean value: 0.7007329145250334

key: test_roc_auc
value: [0.69318182 0.61363636 0.65151515 0.56439394 0.6969697  0.60606061
 0.74621212 0.60984848 0.54545455 0.68181818]

mean value: 0.6409090909090909

key: train_roc_auc
value: [0.67799353 0.68299067 0.68774986 0.71216448 0.68779745 0.67808871
 0.68289549 0.70731011 0.7184466  0.7038835 ]

mean value: 0.6939320388349515

key: test_jcc
value: [0.5        0.47058824 0.46666667 0.375      0.53333333 0.47058824
 0.53846154 0.4375     0.41176471 0.46153846]

mean value: 0.46654411764705883

key: train_jcc
value: [0.51824818 0.51492537 0.52941176 0.55639098 0.52238806 0.51470588
 0.51492537 0.54545455 0.57037037 0.55147059]

mean value: 0.5338291109715274

MCC on Blind test: 0.36

Accuracy on Blind test: 0.68

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01177096 0.01734757 0.01680732 0.01921105 0.01541638 0.01629424
 0.01723695 0.01650763 0.01626301 0.01551557]

mean value: 0.01623706817626953

key: score_time
value: [0.00875068 0.01169848 0.01196933 0.01190495 0.01174927 0.01198626
 0.01195526 0.01173544 0.01152921 0.01165438]

mean value: 0.011493325233459473

key: test_mcc
value: [0.91666667 0.5164589  0.41096386 0.74242424 0.56818182 0.6992059
 0.58930667 0.74242424 0.54232614 0.36514837]

mean value: 0.6093106813380207

key: train_mcc
value: [0.80613459 0.94164684 0.91429989 0.94163576 0.88310329 0.82593778
 0.67701604 0.91224062 0.79681907 0.89527379]

mean value: 0.8594107664325119

key: test_accuracy
value: [0.95652174 0.73913043 0.69565217 0.86956522 0.7826087  0.82608696
 0.7826087  0.86956522 0.72727273 0.68181818]

mean value: 0.7930830039525691

key: train_accuracy
value: [0.89756098 0.97073171 0.95609756 0.97073171 0.94146341 0.90731707
 0.81463415 0.95609756 0.88834951 0.94660194]

mean value: 0.9249585602652143

key: test_fscore
value: [0.95652174 0.76923077 0.72       0.86956522 0.7826087  0.8
 0.76190476 0.86956522 0.78571429 0.66666667]

mean value: 0.79817773530817

key: train_fscore
value: [0.9058296  0.97058824 0.95774648 0.97115385 0.94174757 0.89839572
 0.77108434 0.95609756 0.89956332 0.94835681]

mean value: 0.9220563476088464

key: test_precision
value: [0.91666667 0.66666667 0.64285714 0.83333333 0.81818182 1.
 0.88888889 0.90909091 0.64705882 0.7       ]

mean value: 0.8022744249214837

key: train_precision
value: [0.84166667 0.98019802 0.92727273 0.96190476 0.93269231 0.98823529
 1.         0.95145631 0.81746032 0.91818182]

mean value: 0.9319068223777838

key: test_recall
value: [1.         0.90909091 0.81818182 0.90909091 0.75       0.66666667
 0.66666667 0.83333333 1.         0.63636364]

mean value: 0.818939393939394

key: train_recall
value: [0.98058252 0.96116505 0.99029126 0.98058252 0.95098039 0.82352941
 0.62745098 0.96078431 1.         0.98058252]

mean value: 0.9255948981534361

key: test_roc_auc
value: [0.95833333 0.74621212 0.70075758 0.87121212 0.78409091 0.83333333
 0.78787879 0.87121212 0.72727273 0.68181818]

mean value: 0.7962121212121211

key: train_roc_auc
value: [0.89715401 0.9707786  0.95592994 0.97068342 0.94150961 0.90691034
 0.81372549 0.95612031 0.88834951 0.94660194]

mean value: 0.9247763182943081

key: test_jcc
value: [0.91666667 0.625      0.5625     0.76923077 0.64285714 0.66666667
 0.61538462 0.76923077 0.64705882 0.5       ]

mean value: 0.6714595453566042

key: train_jcc
value: [0.82786885 0.94285714 0.91891892 0.94392523 0.88990826 0.81553398
 0.62745098 0.91588785 0.81746032 0.90178571]

mean value: 0.8601597247948675

MCC on Blind test: 0.31

Accuracy on Blind test: 0.65

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01530957 0.01498342 0.01542163 0.01644206 0.01473165 0.01477385
 0.01383638 0.0155046  0.01537108 0.01408386]

mean value: 0.015045809745788574

key: score_time
value: [0.01238942 0.01142144 0.01168561 0.01138973 0.01165462 0.01171398
 0.01134014 0.01167512 0.01169944 0.01163602]

mean value: 0.011660552024841309

key: test_mcc
value: [0.32232919 0.41096386 0.3030303  0.56490196 0.76277007 0.82575758
 0.40451992 0.66414149 0.40824829 0.40824829]

mean value: 0.5074910937104659

key: train_mcc
value: [0.47469541 0.90259929 0.9707786  0.92479811 0.86761151 0.86052253
 0.37926401 0.803912   0.61850654 0.82977382]

mean value: 0.7632461822687774

key: test_accuracy
value: [0.60869565 0.69565217 0.65217391 0.7826087  0.86956522 0.91304348
 0.65217391 0.82608696 0.68181818 0.68181818]

mean value: 0.7363636363636363

key: train_accuracy
value: [0.68292683 0.95121951 0.98536585 0.96097561 0.93170732 0.92682927
 0.62439024 0.89268293 0.77669903 0.90776699]

mean value: 0.8640563580393086

key: test_fscore
value: [0.30769231 0.72       0.63636364 0.76190476 0.88888889 0.91666667
 0.75       0.81818182 0.74074074 0.58823529]

mean value: 0.7128674114556468

key: train_fscore
value: [0.53900709 0.95192308 0.98536585 0.95959596 0.93457944 0.92146597
 0.72597865 0.87912088 0.81746032 0.89839572]

mean value: 0.8612892956408041

key: test_precision
value: [1.         0.64285714 0.63636364 0.8        0.8        0.91666667
 0.6        0.9        0.625      0.83333333]

mean value: 0.775422077922078

key: train_precision
value: [1.         0.94285714 0.99019608 1.         0.89285714 0.98876404
 0.5698324  1.         0.69127517 1.        ]

mean value: 0.907578197910935

key: test_recall
value: [0.18181818 0.81818182 0.63636364 0.72727273 1.         0.91666667
 1.         0.75       0.90909091 0.45454545]

mean value: 0.7393939393939394

key: train_recall
value: [0.36893204 0.96116505 0.98058252 0.9223301  0.98039216 0.8627451
 1.         0.78431373 1.         0.81553398]

mean value: 0.8675994669712546

key: test_roc_auc
value: [0.59090909 0.70075758 0.65151515 0.78030303 0.86363636 0.91287879
 0.63636364 0.82954545 0.68181818 0.68181818]

mean value: 0.7329545454545454

key: train_roc_auc
value: [0.68446602 0.95117076 0.9853893  0.96116505 0.93194365 0.92651818
 0.62621359 0.89215686 0.77669903 0.90776699]

mean value: 0.8643489434608795

key: test_jcc
value: [0.18181818 0.5625     0.46666667 0.61538462 0.8        0.84615385
 0.6        0.69230769 0.58823529 0.41666667]

mean value: 0.5769732963115316

key: train_jcc
value: [0.36893204 0.90825688 0.97115385 0.9223301  0.87719298 0.85436893
 0.5698324  0.78431373 0.69127517 0.81553398]

mean value: 0.7763190053397688

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.14774132 0.12597418 0.12698054 0.12672639 0.12538242 0.1247561
 0.12422967 0.12333274 0.12338328 0.1225481 ]

mean value: 0.1271054744720459

key: score_time
value: [0.01492047 0.01496315 0.0151732  0.01502323 0.01556945 0.01488686
 0.01481652 0.01488638 0.0149827  0.01499629]

mean value: 0.015021824836730957

key: test_mcc
value: [0.91605722 0.58930667 0.66414149 1.         0.66414149 0.91666667
 0.76277007 0.82575758 1.         1.        ]

mean value: 0.8338841181236702

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95652174 0.7826087  0.82608696 1.         0.82608696 0.95652174
 0.86956522 0.91304348 1.         1.        ]

mean value: 0.9130434782608696

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95238095 0.8        0.83333333 1.         0.81818182 0.95652174
 0.88888889 0.91666667 1.         1.        ]

mean value: 0.9165973398582095

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.71428571 0.76923077 1.         0.9        1.
 0.8        0.91666667 1.         1.        ]

mean value: 0.910018315018315

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.90909091 0.90909091 1.         0.75       0.91666667
 1.         0.91666667 1.         1.        ]

mean value: 0.931060606060606

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.78787879 0.82954545 1.         0.82954545 0.95833333
 0.86363636 0.91287879 1.         1.        ]

mean value: 0.9136363636363636

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.90909091 0.66666667 0.71428571 1.         0.69230769 0.91666667
 0.8        0.84615385 1.         1.        ]

mean value: 0.8545171495171495

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.52

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04642153 0.04462147 0.04027104 0.04553676 0.04686022 0.05081582
 0.03903246 0.04629207 0.04119587 0.04431295]

mean value: 0.04453601837158203

key: score_time
value: [0.01880813 0.02288532 0.02414727 0.0243566  0.02825975 0.02347636
 0.02017665 0.02507305 0.02807355 0.02321911]

mean value: 0.023847579956054688

key: test_mcc
value: [0.50168817 0.58930667 0.56818182 1.         0.76764947 0.74242424
 0.83743579 0.91666667 0.91287093 0.81818182]

mean value: 0.7654405571559706

key: train_mcc
value: [1.         0.98048734 0.98067587 1.         0.99029034 0.98067223
 0.99029034 0.97114302 0.97128586 0.99033794]

mean value: 0.9855182943104235

key: test_accuracy
value: [0.73913043 0.7826087  0.7826087  1.         0.86956522 0.86956522
 0.91304348 0.95652174 0.95454545 0.90909091]

mean value: 0.8776679841897234

key: train_accuracy
value: [1.         0.9902439  0.9902439  1.         0.99512195 0.9902439
 0.99512195 0.98536585 0.98543689 0.99514563]

mean value: 0.9926923987686479

key: test_fscore
value: [0.66666667 0.8        0.7826087  1.         0.85714286 0.86956522
 0.92307692 0.95652174 0.95238095 0.90909091]

mean value: 0.8717053960532221

key: train_fscore
value: [1.         0.99029126 0.99019608 1.         0.99507389 0.99009901
 0.99507389 0.98507463 0.98522167 0.99512195]

mean value: 0.9926152386681547

key: test_precision
value: [0.85714286 0.71428571 0.75       1.         1.         0.90909091
 0.85714286 1.         1.         0.90909091]

mean value: 0.8996753246753246

key: train_precision
value: [1.         0.99029126 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9990291262135922

key: test_recall
value: [0.54545455 0.90909091 0.81818182 1.         0.75       0.83333333
 1.         0.91666667 0.90909091 0.90909091]

mean value: 0.8590909090909091

key: train_recall
value: [1.         0.99029126 0.98058252 1.         0.99019608 0.98039216
 0.99019608 0.97058824 0.97087379 0.99029126]

mean value: 0.9863411383971065

key: test_roc_auc
value: [0.73106061 0.78787879 0.78409091 1.         0.875      0.87121212
 0.90909091 0.95833333 0.95454545 0.90909091]

mean value: 0.878030303030303

key: train_roc_auc
value: [1.         0.99024367 0.99029126 1.         0.99509804 0.99019608
 0.99509804 0.98529412 0.98543689 0.99514563]

mean value: 0.9926803731201218

key: test_jcc
value: [0.5        0.66666667 0.64285714 1.         0.75       0.76923077
 0.85714286 0.91666667 0.90909091 0.83333333]

mean value: 0.7844988344988345

key: train_jcc
value: [1.         0.98076923 0.98058252 1.         0.99019608 0.98039216
 0.99019608 0.97058824 0.97087379 0.99029126]

mean value: 0.9853889352604372

MCC on Blind test: 0.03

Accuracy on Blind test: 0.51

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.045012   0.04318142 0.0282228  0.02828217 0.0310216  0.08236885
 0.0660882  0.06425071 0.07074213 0.07848597]

mean value: 0.05376558303833008

key: score_time
value: [0.02342558 0.01257706 0.01256967 0.01259303 0.02418852 0.02631974
 0.02399254 0.02026963 0.02379179 0.02296281]

mean value: 0.020269036293029785

key: test_mcc
value: [0.47727273 0.48856385 0.31252706 0.03816905 0.5164589  0.66414149
 0.5164589  0.38932432 0.29277002 0.18898224]

mean value: 0.3884668558613338

key: train_mcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99033794]

mean value: 0.9990337937660287

key: test_accuracy
value: [0.73913043 0.73913043 0.65217391 0.52173913 0.73913043 0.82608696
 0.73913043 0.69565217 0.63636364 0.59090909]

mean value: 0.6879446640316206

key: train_accuracy
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99514563]

mean value: 0.9995145631067961

key: test_fscore
value: [0.72727273 0.75       0.55555556 0.47619048 0.7        0.81818182
 0.7        0.72       0.55555556 0.52631579]

mean value: 0.6529071922229817

key: train_fscore
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99512195]

mean value: 0.9995121951219512

key: test_precision
value: [0.72727273 0.69230769 0.71428571 0.5        0.875      0.9
 0.875      0.69230769 0.71428571 0.625     ]

mean value: 0.731545954045954

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.81818182 0.45454545 0.45454545 0.58333333 0.75
 0.58333333 0.75       0.45454545 0.45454545]

mean value: 0.603030303030303

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99029126]

mean value: 0.9990291262135922

key: test_roc_auc
value: [0.73863636 0.74242424 0.64393939 0.51893939 0.74621212 0.82954545
 0.74621212 0.69318182 0.63636364 0.59090909]

mean value: 0.6886363636363636

key: train_roc_auc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99514563]

mean value: 0.9995145631067961

key: test_jcc
value: [0.57142857 0.6        0.38461538 0.3125     0.53846154 0.69230769
 0.53846154 0.5625     0.38461538 0.35714286]

mean value: 0.4942032967032967

key: train_jcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99029126]

mean value: 0.9990291262135922

MCC on Blind test: 0.17

Accuracy on Blind test: 0.59

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.43832111 0.41460967 0.41730499 0.418607   0.41266894 0.41043901
 0.41033554 0.40992641 0.41230726 0.40911317]

mean value: 0.4153633117675781

key: score_time
value: [0.00984025 0.00928712 0.00983596 0.00985432 0.00924468 0.0091536
 0.00908923 0.00908732 0.00944901 0.00996041]

mean value: 0.00948019027709961

key: test_mcc
value: [0.74047959 0.5164589  0.48856385 1.         0.76764947 0.91666667
 0.76277007 1.         1.         1.        ]

mean value: 0.8192588557461732

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.73913043 0.73913043 1.         0.86956522 0.95652174
 0.86956522 1.         1.         1.        ]

mean value: 0.9043478260869565

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.76923077 0.75       1.         0.85714286 0.95652174
 0.88888889 1.         1.         1.        ]

mean value: 0.9078927111535807

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.66666667 0.69230769 1.         1.         1.
 0.8        1.         1.         1.        ]

mean value: 0.9058974358974359

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.90909091 0.81818182 1.         0.75       0.91666667
 1.         1.         1.         1.        ]

mean value: 0.9212121212121213

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 0.74621212 0.74242424 1.         0.875      0.95833333
 0.86363636 1.         1.         1.        ]

mean value: 0.9053030303030303

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.625      0.6        1.         0.75       0.91666667
 0.8        1.         1.         1.        ]

mean value: 0.8441666666666666

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.53

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02009869 0.02192163 0.02121377 0.02089286 0.03631091 0.02019191
 0.0366838  0.02073383 0.03662992 0.03478622]

mean value: 0.026946353912353515

key: score_time
value: [0.01233768 0.01223588 0.01664376 0.01710796 0.01231337 0.01765871
 0.01238084 0.017627   0.01230049 0.01220369]

mean value: 0.014280939102172851

key: test_mcc
value: [0.47727273 0.83971912 0.23262105 0.56818182 0.91605722 0.62050523
 0.91605722 0.65909298 0.63636364 0.64715023]

mean value: 0.6513021246123849

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73913043 0.91304348 0.60869565 0.7826087  0.95652174 0.7826087
 0.95652174 0.82608696 0.81818182 0.81818182]

mean value: 0.8201581027667985

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.72727273 0.91666667 0.64       0.7826087  0.96       0.82758621
 0.96       0.84615385 0.81818182 0.83333333]

mean value: 0.8311803294157117

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.72727273 0.84615385 0.57142857 0.75       0.92307692 0.70588235
 0.92307692 0.78571429 0.81818182 0.76923077]

mean value: 0.782001821707704

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 1.         0.72727273 0.81818182 1.         1.
 1.         0.91666667 0.81818182 0.90909091]

mean value: 0.8916666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73863636 0.91666667 0.61363636 0.78409091 0.95454545 0.77272727
 0.95454545 0.8219697  0.81818182 0.81818182]

mean value: 0.8193181818181818

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.57142857 0.84615385 0.47058824 0.64285714 0.92307692 0.70588235
 0.92307692 0.73333333 0.69230769 0.71428571]

mean value: 0.7222990734755441

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.21

Accuracy on Blind test: 0.58

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02222586 0.02766562 0.0338943  0.02159619 0.03423214 0.03410792
 0.03486061 0.03411412 0.03433919 0.03421783]

mean value: 0.031125378608703614

key: score_time
value: [0.02374887 0.02142906 0.02060509 0.02018595 0.02212143 0.0235455
 0.02057886 0.02279162 0.0233283  0.02117443]

mean value: 0.021950912475585938

key: test_mcc
value: [0.82575758 0.74242424 0.56818182 0.82575758 0.66414149 0.82575758
 0.74047959 0.82575758 0.63636364 0.46225016]

mean value: 0.7116871241591681

key: train_mcc
value: [0.9024367  0.93175328 0.92194936 0.91224062 0.93175328 0.93174679
 0.94163576 0.90259929 0.89324598 0.93243443]

mean value: 0.9201795515216483

key: test_accuracy
value: [0.91304348 0.86956522 0.7826087  0.91304348 0.82608696 0.91304348
 0.86956522 0.91304348 0.81818182 0.72727273]

mean value: 0.8545454545454545

key: train_accuracy
value: [0.95121951 0.96585366 0.96097561 0.95609756 0.96585366 0.96585366
 0.97073171 0.95121951 0.94660194 0.96601942]

mean value: 0.9600426237272082

key: test_fscore
value: [0.90909091 0.86956522 0.7826087  0.90909091 0.81818182 0.91666667
 0.88       0.91666667 0.81818182 0.7       ]

mean value: 0.8520052700922266

key: train_fscore
value: [0.95145631 0.96585366 0.96116505 0.95609756 0.96585366 0.96551724
 0.97029703 0.95049505 0.9468599  0.96650718]

mean value: 0.9600102638274448

key: test_precision
value: [0.90909091 0.83333333 0.75       0.90909091 0.9        0.91666667
 0.84615385 0.91666667 0.81818182 0.77777778]

mean value: 0.8576961926961927

key: train_precision
value: [0.95145631 0.97058824 0.96116505 0.96078431 0.96116505 0.97029703
 0.98       0.96       0.94230769 0.95283019]

mean value: 0.9610593867476506

key: test_recall
value: [0.90909091 0.90909091 0.81818182 0.90909091 0.75       0.91666667
 0.91666667 0.91666667 0.81818182 0.63636364]

mean value: 0.85

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:135: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:138: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.95145631 0.96116505 0.96116505 0.95145631 0.97058824 0.96078431
 0.96078431 0.94117647 0.95145631 0.98058252]

mean value: 0.9590614886731392

key: test_roc_auc
value: [0.91287879 0.87121212 0.78409091 0.91287879 0.82954545 0.91287879
 0.86742424 0.91287879 0.81818182 0.72727273]

mean value: 0.8549242424242424

key: train_roc_auc
value: [0.95121835 0.96587664 0.96097468 0.95612031 0.96587664 0.96582905
 0.97068342 0.95117076 0.94660194 0.96601942]

mean value: 0.9600371216447744

key: test_jcc
value: [0.83333333 0.76923077 0.64285714 0.83333333 0.69230769 0.84615385
 0.78571429 0.84615385 0.69230769 0.53846154]

mean value: 0.747985347985348

key: train_jcc
value: [0.90740741 0.93396226 0.92523364 0.91588785 0.93396226 0.93333333
 0.94230769 0.90566038 0.89908257 0.93518519]

mean value: 0.9232022588028438

MCC on Blind test: 0.19

Accuracy on Blind test: 0.59

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.34404731 0.23672104 0.2441833  0.23749495 0.25613022 0.24685955
 0.24196959 0.28543353 0.30825043 0.23037481]

mean value: 0.2631464719772339

key: score_time
value: [0.02091432 0.02341533 0.02169251 0.01936412 0.01624894 0.0224669
 0.01370192 0.01648545 0.02128911 0.02203679]

mean value: 0.019761538505554198

key: test_mcc
value: [0.82575758 0.65909298 0.56818182 0.82575758 0.66414149 0.82575758
 0.74047959 0.82575758 0.73029674 0.46225016]

mean value: 0.7127473088509607

key: train_mcc
value: [0.9024367  0.93175328 0.92194936 0.91224062 0.95163291 0.93174679
 0.94163576 0.90259929 0.93208276 0.93243443]

mean value: 0.9260511918867643

key: test_accuracy
value: [0.91304348 0.82608696 0.7826087  0.91304348 0.82608696 0.91304348
 0.86956522 0.91304348 0.86363636 0.72727273]

mean value: 0.8547430830039525

key: train_accuracy
value: [0.95121951 0.96585366 0.96097561 0.95609756 0.97560976 0.96585366
 0.97073171 0.95121951 0.96601942 0.96601942]

mean value: 0.9629599810561212

key: test_fscore
value: [0.90909091 0.8        0.7826087  0.90909091 0.81818182 0.91666667
 0.88       0.91666667 0.85714286 0.7       ]

mean value: 0.8489448522492

key: train_fscore
value: [0.95145631 0.96585366 0.96116505 0.95609756 0.97584541 0.96551724
 0.97029703 0.95049505 0.96618357 0.96650718]

mean value: 0.9629418061863466

key: test_precision
value: [0.90909091 0.88888889 0.75       0.90909091 0.9        0.91666667
 0.84615385 0.91666667 0.9        0.77777778]

mean value: 0.8714335664335664

key: train_precision
value: [0.95145631 0.97058824 0.96116505 0.96078431 0.96190476 0.97029703
 0.98       0.96       0.96153846 0.95283019]

mean value: 0.9630564350068348

key: test_recall
value: [0.90909091 0.72727273 0.81818182 0.90909091 0.75       0.91666667
 0.91666667 0.91666667 0.81818182 0.63636364]

mean value: 0.8318181818181818

key: train_recall
value: [0.95145631 0.96116505 0.96116505 0.95145631 0.99019608 0.96078431
 0.96078431 0.94117647 0.97087379 0.98058252]

mean value: 0.9629640205596802

key: test_roc_auc
value: [0.91287879 0.8219697  0.78409091 0.91287879 0.82954545 0.91287879
 0.86742424 0.91287879 0.86363636 0.72727273]

mean value: 0.8545454545454545

key: train_roc_auc
value: [0.95121835 0.96587664 0.96097468 0.95612031 0.97568056 0.96582905
 0.97068342 0.95117076 0.96601942 0.96601942]

mean value: 0.9629592613744528

key: test_jcc
value: [0.83333333 0.66666667 0.64285714 0.83333333 0.69230769 0.84615385
 0.78571429 0.84615385 0.75       0.53846154]

mean value: 0.7434981684981685

key: train_jcc
value: [0.90740741 0.93396226 0.92523364 0.91588785 0.95283019 0.93333333
 0.94230769 0.90566038 0.93457944 0.93518519]

mean value: 0.9286387383001737

MCC on Blind test: 0.12

Accuracy on Blind test: 0.56

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.0311234  0.03104281 0.03307748 0.03271508 0.03284502 0.03534842
 0.0292809  0.03247356 0.0327239  0.0364511 ]

mean value: 0.032708168029785156

key: score_time
value: [0.01306868 0.01228809 0.01467776 0.01206303 0.01464558 0.01219606
 0.01215053 0.01212502 0.0147841  0.01199746]

mean value: 0.012999629974365235

key: test_mcc
value: [0.74047959 0.5164589  0.48856385 0.56490196 0.74047959 0.83971912
 0.74047959 0.91666667 0.63636364 0.18257419]

mean value: 0.6366687092895582

key: train_mcc
value: [0.86356283 0.84451258 0.87352395 0.8350976  0.82455974 0.84407425
 0.88361919 0.81564443 0.81742389 0.91266437]

mean value: 0.8514682834181677

key: test_accuracy
value: [0.86956522 0.73913043 0.73913043 0.7826087  0.86956522 0.91304348
 0.86956522 0.95652174 0.81818182 0.59090909]

mean value: 0.8148221343873517

key: train_accuracy
value: [0.93170732 0.92195122 0.93658537 0.91707317 0.91219512 0.92195122
 0.94146341 0.90731707 0.90776699 0.95631068]

mean value: 0.9254321572341937

key: test_fscore
value: [0.85714286 0.76923077 0.75       0.76190476 0.88       0.90909091
 0.88       0.95652174 0.81818182 0.57142857]

mean value: 0.8153501426110121

key: train_fscore
value: [0.93269231 0.92380952 0.93779904 0.91943128 0.91262136 0.9223301
 0.94230769 0.90909091 0.91079812 0.95652174]

mean value: 0.926740207309033

key: test_precision
value: [0.9        0.66666667 0.69230769 0.8        0.84615385 1.
 0.84615385 1.         0.81818182 0.6       ]

mean value: 0.816946386946387

key: train_precision
value: [0.92380952 0.90654206 0.9245283  0.89814815 0.90384615 0.91346154
 0.9245283  0.88785047 0.88181818 0.95192308]

mean value: 0.9116455750144694

key: test_recall
value: [0.81818182 0.90909091 0.81818182 0.72727273 0.91666667 0.83333333
 0.91666667 0.91666667 0.81818182 0.54545455]

mean value: 0.821969696969697

key: train_recall
value: [0.94174757 0.94174757 0.95145631 0.94174757 0.92156863 0.93137255
 0.96078431 0.93137255 0.94174757 0.96116505]

mean value: 0.9424709689701123

key: test_roc_auc
value: [0.86742424 0.74621212 0.74242424 0.78030303 0.86742424 0.91666667
 0.86742424 0.95833333 0.81818182 0.59090909]

mean value: 0.8155303030303029

key: train_roc_auc
value: [0.9316581  0.92185418 0.93651247 0.91695222 0.91224062 0.92199695
 0.94155721 0.90743385 0.90776699 0.95631068]

mean value: 0.925428326670474

key: test_jcc
value: [0.75       0.625      0.6        0.61538462 0.78571429 0.83333333
 0.78571429 0.91666667 0.69230769 0.4       ]

mean value: 0.7004120879120879

key: train_jcc
value: [0.87387387 0.85840708 0.88288288 0.85087719 0.83928571 0.85585586
 0.89090909 0.83333333 0.8362069  0.91666667]

mean value: 0.8638298586987616

MCC on Blind test: 0.34

Accuracy on Blind test: 0.67

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.93439674 0.79571605 0.78386688 0.91385674 0.78201485 0.82960582
 0.88547063 0.7484262  0.93659878 0.74970579]

mean value: 0.8359658479690552

key: score_time
value: [0.01902795 0.01569152 0.0154388  0.01550126 0.01569033 0.01552463
 0.01555157 0.01228809 0.01748419 0.01834369]

mean value: 0.016054201126098632

key: test_mcc
value: [0.74047959 0.56818182 0.56818182 0.65151515 0.76764947 0.91666667
 0.56490196 0.91666667 0.75592895 0.46225016]

mean value: 0.691242224971122

key: train_mcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99033794]

mean value: 0.9990337937660287

key: test_accuracy
value: [0.86956522 0.7826087  0.7826087  0.82608696 0.86956522 0.95652174
 0.7826087  0.95652174 0.86363636 0.72727273]

mean value: 0.841699604743083

key: train_accuracy
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99514563]

mean value: 0.9995145631067961

key: test_fscore
value: [0.85714286 0.7826087  0.7826087  0.81818182 0.85714286 0.95652174
 0.8        0.95652174 0.84210526 0.7       ]

mean value: 0.8352833665190644

key: train_fscore
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99512195]

mean value: 0.9995121951219512

key: test_precision
value: [0.9        0.75       0.75       0.81818182 1.         1.
 0.76923077 1.         1.         0.77777778]

mean value: 0.8765190365190365

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.81818182 0.81818182 0.81818182 0.75       0.91666667
 0.83333333 0.91666667 0.72727273 0.63636364]

mean value: 0.8053030303030303

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99029126]

mean value: 0.9990291262135922

key: test_roc_auc
value: [0.86742424 0.78409091 0.78409091 0.82575758 0.875      0.95833333
 0.78030303 0.95833333 0.86363636 0.72727273]

mean value: 0.8424242424242424

key: train_roc_auc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99514563]

mean value: 0.9995145631067961

key: test_jcc
value: [0.75       0.64285714 0.64285714 0.69230769 0.75       0.91666667
 0.66666667 0.91666667 0.72727273 0.53846154]

mean value: 0.7243756243756244

key: train_jcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.99029126]

mean value: 0.9990291262135922

MCC on Blind test: 0.19

Accuracy on Blind test: 0.59

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.02677464 0.01045084 0.01013684 0.00999856 0.01005387 0.01009083
 0.00991368 0.0105505  0.01012111 0.01038074]

mean value: 0.011847162246704101

key: score_time
value: [0.01160312 0.00989413 0.00953531 0.00954342 0.00953102 0.0099299
 0.00961757 0.00996542 0.00982594 0.00982404]

mean value: 0.009926986694335938

key: test_mcc
value: [0.44411739 0.41096386 0.41096386 0.15096491 0.22407133 0.74047959
 0.42228828 0.24960096 0.09759001 0.18898224]

mean value: 0.3340022427057291

key: train_mcc
value: [0.36627048 0.417866   0.45930893 0.40305908 0.501235   0.431714
 0.49387839 0.43730041 0.46621721 0.50903935]

mean value: 0.44858888387912876

key: test_accuracy
value: [0.69565217 0.69565217 0.69565217 0.56521739 0.60869565 0.86956522
 0.69565217 0.60869565 0.54545455 0.59090909]

mean value: 0.6571146245059288

key: train_accuracy
value: [0.65853659 0.69756098 0.71707317 0.67317073 0.74146341 0.70243902
 0.73170732 0.70731707 0.73300971 0.74271845]

mean value: 0.7104996448022732

key: test_fscore
value: [0.74074074 0.72       0.72       0.61538462 0.68965517 0.88
 0.75862069 0.70967742 0.61538462 0.64      ]

mean value: 0.7089463252933775

key: train_fscore
value: [0.72868217 0.74166667 0.75833333 0.74131274 0.77056277 0.74476987
 0.76987448 0.74576271 0.72906404 0.77637131]

mean value: 0.7506400093172734

key: test_precision
value: [0.625      0.64285714 0.64285714 0.53333333 0.58823529 0.84615385
 0.64705882 0.57894737 0.53333333 0.57142857]

mean value: 0.6209204856031482

key: train_precision
value: [0.60645161 0.64963504 0.66423358 0.61538462 0.68992248 0.64963504
 0.67153285 0.65671642 0.74       0.68656716]

mean value: 0.6630078787347913

key: test_recall
value: [0.90909091 0.81818182 0.81818182 0.72727273 0.83333333 0.91666667
 0.91666667 0.91666667 0.72727273 0.72727273]

mean value: 0.831060606060606

key: train_recall
value: [0.91262136 0.86407767 0.88349515 0.93203883 0.87254902 0.87254902
 0.90196078 0.8627451  0.7184466  0.89320388]

mean value: 0.8713687416714259

key: test_roc_auc
value: [0.70454545 0.70075758 0.70075758 0.5719697  0.59848485 0.86742424
 0.68560606 0.59469697 0.54545455 0.59090909]

mean value: 0.656060606060606

key: train_roc_auc
value: [0.65729107 0.69674472 0.71625738 0.67190177 0.74209975 0.7032648
 0.73253379 0.70807158 0.73300971 0.74271845]

mean value: 0.7103893013516086

key: test_jcc
value: [0.58823529 0.5625     0.5625     0.44444444 0.52631579 0.78571429
 0.61111111 0.55       0.44444444 0.47058824]

mean value: 0.5545853604599734

key: train_jcc
value: [0.57317073 0.58940397 0.61073826 0.58895706 0.62676056 0.59333333
 0.62585034 0.59459459 0.57364341 0.63448276]

mean value: 0.6010935016383199

MCC on Blind test: 0.45

Accuracy on Blind test: 0.71

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00957608 0.0098474  0.00965476 0.00997877 0.01009202 0.00920963
 0.00911617 0.00889516 0.00902343 0.00907636]

mean value: 0.009446978569030762

key: score_time
value: [0.00914979 0.00911403 0.00903225 0.00959492 0.00927424 0.0087173
 0.00860095 0.00873423 0.00871611 0.00861526]

mean value: 0.008954906463623047

key: test_mcc
value: [0.58002308 0.12878788 0.12336594 0.21452908 0.39393939 0.39393939
 0.05427825 0.39393939 0.18257419 0.20412415]

mean value: 0.26695007377892416

key: train_mcc
value: [0.43994849 0.50824626 0.49637007 0.45056913 0.46832513 0.45757548
 0.46948042 0.48928361 0.48018451 0.46191786]

mean value: 0.47219009507999327

key: test_accuracy
value: [0.7826087  0.56521739 0.56521739 0.60869565 0.69565217 0.69565217
 0.52173913 0.69565217 0.59090909 0.59090909]

mean value: 0.6312252964426878

key: train_accuracy
value: [0.71707317 0.75121951 0.74634146 0.72195122 0.72682927 0.72682927
 0.73170732 0.74146341 0.73786408 0.72815534]

mean value: 0.7329434051622069

key: test_fscore
value: [0.73684211 0.54545455 0.44444444 0.52631579 0.69565217 0.69565217
 0.47619048 0.69565217 0.57142857 0.47058824]

mean value: 0.5858220689288127

key: train_fscore
value: [0.69473684 0.73298429 0.73195876 0.6984127  0.68539326 0.70526316
 0.70588235 0.71657754 0.71875    0.70526316]

mean value: 0.7095222063862845

key: test_precision
value: [0.875      0.54545455 0.57142857 0.625      0.72727273 0.72727273
 0.55555556 0.72727273 0.6        0.66666667]

mean value: 0.6620923520923521

key: train_precision
value: [0.75862069 0.79545455 0.78021978 0.76744186 0.80263158 0.76136364
 0.77647059 0.78823529 0.7752809  0.77011494]

mean value: 0.7775833814863701

key: test_recall
value: [0.63636364 0.54545455 0.36363636 0.45454545 0.66666667 0.66666667
 0.41666667 0.66666667 0.54545455 0.36363636]

mean value: 0.5325757575757576

key: train_recall
value: [0.6407767  0.67961165 0.68932039 0.6407767  0.59803922 0.65686275
 0.64705882 0.65686275 0.66990291 0.65048544]

mean value: 0.6529697315819532

key: test_roc_auc
value: [0.77651515 0.56439394 0.55681818 0.60227273 0.6969697  0.6969697
 0.52651515 0.6969697  0.59090909 0.59090909]

mean value: 0.6299242424242424

key: train_roc_auc
value: [0.71744717 0.75157053 0.74662098 0.72234913 0.72620407 0.72648962
 0.7312964  0.74105273 0.73786408 0.72815534]

mean value: 0.7329050066628594

key: test_jcc
value: [0.58333333 0.375      0.28571429 0.35714286 0.53333333 0.53333333
 0.3125     0.53333333 0.4        0.30769231]

mean value: 0.4221382783882784

key: train_jcc
value: [0.53225806 0.5785124  0.57723577 0.53658537 0.52136752 0.54471545
 0.54545455 0.55833333 0.56097561 0.54471545]

mean value: 0.5500153503642167

MCC on Blind test: 0.26

Accuracy on Blind test: 0.63

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00882125 0.00910711 0.01014447 0.00869751 0.00881767 0.0097332
 0.00888371 0.00917006 0.00906801 0.00977707]

mean value: 0.009222006797790528

key: score_time
value: [0.01601529 0.01748419 0.01591444 0.00995803 0.00983834 0.01029491
 0.01002693 0.01065493 0.01047945 0.01068568]

mean value: 0.01213521957397461

key: test_mcc
value: [ 0.12406456 -0.05427825  0.22407133  0.39727608 -0.03816905  0.15096491
  0.23262105  0.21452908 -0.09245003 -0.18257419]

mean value: 0.0976055502533363

key: train_mcc
value: [0.54175    0.5037683  0.49637007 0.50824626 0.49294992 0.54256731
 0.54702284 0.44388387 0.495239   0.55433939]

mean value: 0.5126136961359685

key: test_accuracy
value: [0.56521739 0.47826087 0.60869565 0.69565217 0.47826087 0.56521739
 0.60869565 0.60869565 0.45454545 0.40909091]

mean value: 0.5472332015810277

key: train_accuracy
value: [0.77073171 0.75121951 0.74634146 0.75121951 0.74634146 0.77073171
 0.77073171 0.72195122 0.74757282 0.77669903]

mean value: 0.7553540137343121

key: test_fscore
value: [0.5        0.4        0.47058824 0.63157895 0.45454545 0.5
 0.57142857 0.66666667 0.4        0.38095238]

mean value: 0.49757602562556125

key: train_fscore
value: [0.76847291 0.74371859 0.73195876 0.73298429 0.74       0.76142132
 0.75132275 0.71921182 0.75       0.77      ]

mean value: 0.7469090449228885

key: test_precision
value: [0.55555556 0.44444444 0.66666667 0.75       0.5        0.625
 0.66666667 0.6        0.44444444 0.4       ]

mean value: 0.5652777777777778

key: train_precision
value: [0.78       0.77083333 0.78021978 0.79545455 0.75510204 0.78947368
 0.81609195 0.72277228 0.74285714 0.79381443]

mean value: 0.7746619191132057

key: test_recall
value: [0.45454545 0.36363636 0.36363636 0.54545455 0.41666667 0.41666667
 0.5        0.75       0.36363636 0.36363636]

mean value: 0.4537878787878788

key: train_recall
value: [0.75728155 0.7184466  0.68932039 0.67961165 0.7254902  0.73529412
 0.69607843 0.71568627 0.75728155 0.74757282]

mean value: 0.722206358271464

key: test_roc_auc
value: [0.56060606 0.47348485 0.59848485 0.68939394 0.48106061 0.5719697
 0.61363636 0.60227273 0.45454545 0.40909091]

mean value: 0.5454545454545454

key: train_roc_auc
value: [0.77079764 0.75138016 0.74662098 0.75157053 0.74624024 0.77055968
 0.77036931 0.72192081 0.74757282 0.77669903]

mean value: 0.7553731201218352

key: test_jcc
value: [0.33333333 0.25       0.30769231 0.46153846 0.29411765 0.33333333
 0.4        0.5        0.25       0.23529412]

mean value: 0.3365309200603318

key: train_jcc
value: [0.624      0.592      0.57723577 0.5785124  0.58730159 0.6147541
 0.60169492 0.56153846 0.6        0.62601626]

mean value: 0.5963053491669482

MCC on Blind test: 0.13

Accuracy on Blind test: 0.56

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01364207 0.01871586 0.01185751 0.01367044 0.01809502 0.01582408
 0.01185703 0.01197171 0.0172205  0.0127852 ]

mean value: 0.014563941955566406

key: score_time
value: [0.01032472 0.01589632 0.01048732 0.01059628 0.01564956 0.00968361
 0.00964212 0.01131916 0.01068449 0.00989556]

mean value: 0.011417913436889648

key: test_mcc
value: [0.47727273 0.56818182 0.31298622 0.38932432 0.47727273 0.58930667
 0.56490196 0.65909298 0.18898224 0.        ]

mean value: 0.4227321652529521

key: train_mcc
value: [0.81495251 0.73751939 0.82438607 0.74685628 0.71733345 0.74633543
 0.76638754 0.72814868 0.71088536 0.77761579]

mean value: 0.7570420501696654

key: test_accuracy
value: [0.73913043 0.7826087  0.65217391 0.69565217 0.73913043 0.7826087
 0.7826087  0.82608696 0.59090909 0.5       ]

mean value: 0.7090909090909091

key: train_accuracy
value: [0.90731707 0.86829268 0.91219512 0.87317073 0.85853659 0.87317073
 0.88292683 0.86341463 0.85436893 0.88834951]

mean value: 0.8781742836845844

key: test_fscore
value: [0.72727273 0.7826087  0.66666667 0.66666667 0.75       0.76190476
 0.8        0.84615385 0.64       0.52173913]

mean value: 0.7163012494751625

key: train_fscore
value: [0.90909091 0.86567164 0.91262136 0.87619048 0.85572139 0.87254902
 0.88       0.86666667 0.85981308 0.89099526]

mean value: 0.8789319810380724

key: test_precision
value: [0.72727273 0.75       0.61538462 0.7        0.75       0.88888889
 0.76923077 0.78571429 0.57142857 0.5       ]

mean value: 0.7057919857919858

key: train_precision
value: [0.89622642 0.8877551  0.91262136 0.85981308 0.86868687 0.87254902
 0.89795918 0.84259259 0.82882883 0.87037037]

mean value: 0.8737402824230579

key: test_recall
value: [0.72727273 0.81818182 0.72727273 0.63636364 0.75       0.66666667
 0.83333333 0.91666667 0.72727273 0.54545455]

mean value: 0.7348484848484849

key: train_recall
value: [0.9223301  0.84466019 0.91262136 0.89320388 0.84313725 0.87254902
 0.8627451  0.89215686 0.89320388 0.91262136]

mean value: 0.8849229011993147

key: test_roc_auc
value: [0.73863636 0.78409091 0.65530303 0.69318182 0.73863636 0.78787879
 0.78030303 0.8219697  0.59090909 0.5       ]

mean value: 0.7090909090909091

key: train_roc_auc
value: [0.90724348 0.86840853 0.91219303 0.87307253 0.85846183 0.87316771
 0.88282886 0.86355416 0.85436893 0.88834951]

mean value: 0.8781648581762802

key: test_jcc
value: [0.57142857 0.64285714 0.5        0.5        0.6        0.61538462
 0.66666667 0.73333333 0.47058824 0.35294118]

mean value: 0.5653199741435035

key: train_jcc
value: [0.83333333 0.76315789 0.83928571 0.77966102 0.74782609 0.77391304
 0.78571429 0.76470588 0.75409836 0.8034188 ]

mean value: 0.7845114421881593

MCC on Blind test: 0.46

Accuracy on Blind test: 0.73

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.26194549 1.00380898 0.56620073 0.398633   0.38738036 0.52770424
 0.82983422 0.21668768 0.39636707 0.45531631]

mean value: 0.6043878078460694

key: score_time
value: [0.01219726 0.01478028 0.0121758  0.01218724 0.01211095 0.01325655
 0.01218605 0.01219201 0.01216078 0.01219201]

mean value: 0.012543892860412598

key: test_mcc
value: [ 0.65151515  0.56490196  0.41096386  0.38932432  0.56879646  0.76277007
  0.83971912  0.12844577  0.09090909 -0.18257419]

mean value: 0.42247716177937644

key: train_mcc
value: [0.86600321 0.95126131 0.71892689 0.58007639 0.61699176 0.64768695
 0.76036002 0.52267493 0.58321184 0.80643358]

mean value: 0.7053626889094041

key: test_accuracy
value: [0.82608696 0.7826087  0.69565217 0.69565217 0.73913043 0.86956522
 0.91304348 0.56521739 0.54545455 0.40909091]

mean value: 0.7041501976284584

key: train_accuracy
value: [0.93170732 0.97560976 0.84390244 0.76097561 0.7902439  0.79512195
 0.87804878 0.75609756 0.79126214 0.90291262]

mean value: 0.8425882074354725

key: test_fscore
value: [0.81818182 0.76190476 0.72       0.66666667 0.66666667 0.88888889
 0.90909091 0.66666667 0.54545455 0.43478261]

mean value: 0.7078303532216575

key: train_fscore
value: [0.93457944 0.97584541 0.86440678 0.80478088 0.74556213 0.82926829
 0.87046632 0.77678571 0.78606965 0.9047619 ]

mean value: 0.8492526520928274

key: test_precision
value: [0.81818182 0.8        0.64285714 0.7        1.         0.8
 1.         0.55555556 0.54545455 0.41666667]

mean value: 0.7278715728715729

key: train_precision
value: [0.9009009  0.97115385 0.76691729 0.68243243 0.94029851 0.70833333
 0.92307692 0.71311475 0.80612245 0.88785047]

mean value: 0.8300200906960877

key: test_recall
value: [0.81818182 0.72727273 0.81818182 0.63636364 0.5        1.
 0.83333333 0.83333333 0.54545455 0.45454545]

mean value: 0.7166666666666667

key: train_recall
value: [0.97087379 0.98058252 0.99029126 0.98058252 0.61764706 1.
 0.82352941 0.85294118 0.76699029 0.9223301 ]

mean value: 0.8905768132495717

key: test_roc_auc
value: [0.82575758 0.78030303 0.70075758 0.69318182 0.75       0.86363636
 0.91666667 0.5530303  0.54545455 0.40909091]

mean value: 0.7037878787878787

key: train_roc_auc
value: [0.93151532 0.97558538 0.84318485 0.75989911 0.78940605 0.7961165
 0.87778412 0.75656768 0.79126214 0.90291262]

mean value: 0.8424233771178374

key: test_jcc
value: [0.69230769 0.61538462 0.5625     0.5        0.5        0.8
 0.83333333 0.5        0.375      0.27777778]

mean value: 0.5656303418803419

key: train_jcc
value: [0.87719298 0.95283019 0.76119403 0.67333333 0.59433962 0.70833333
 0.7706422  0.6350365  0.64754098 0.82608696]

mean value: 0.7446530128607832

MCC on Blind test: 0.29

Accuracy on Blind test: 0.64

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01721716 0.01397586 0.01314878 0.01377201 0.01334882 0.01375103
 0.01316309 0.0134654  0.01368165 0.01338196]

mean value: 0.013890576362609864

key: score_time
value: [0.01574564 0.00902081 0.00891137 0.00908399 0.00947738 0.0089283
 0.00954652 0.00874829 0.00894642 0.00875568]

mean value: 0.009716439247131347

key: test_mcc
value: [0.65909298 0.82575758 0.65151515 1.         1.         0.91666667
 1.         0.74242424 0.73029674 1.        ]

mean value: 0.8525753362069441

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.82608696 0.91304348 0.82608696 1.         1.         0.95652174
 1.         0.86956522 0.86363636 1.        ]

mean value: 0.925494071146245

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        0.90909091 0.81818182 1.         1.         0.95652174
 1.         0.86956522 0.85714286 1.        ]

mean value: 0.9210502540937323

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.90909091 0.81818182 1.         1.         1.
 1.         0.90909091 0.9        1.        ]

mean value: 0.9425252525252525

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.90909091 0.81818182 1.         1.         0.91666667
 1.         0.83333333 0.81818182 1.        ]

mean value: 0.9022727272727273

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8219697  0.91287879 0.82575758 1.         1.         0.95833333
 1.         0.87121212 0.86363636 1.        ]

mean value: 0.9253787878787879

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 0.83333333 0.69230769 1.         1.         0.91666667
 1.         0.76923077 0.75       1.        ]

mean value: 0.8628205128205129

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.54

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.09688735 0.09758043 0.09672451 0.0968442  0.09741735 0.09862638
 0.09707808 0.09623432 0.09967422 0.10141206]

mean value: 0.09784789085388183

key: score_time
value: [0.01796174 0.01781249 0.01779795 0.01884341 0.0174849  0.01741695
 0.01794338 0.01800895 0.0174613  0.017524  ]

mean value: 0.017825508117675783

key: test_mcc
value: [0.76764947 0.91666667 0.56818182 0.38932432 0.41096386 0.82575758
 0.82575758 0.83743579 0.83205029 0.45454545]

mean value: 0.6828332828329148

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.95652174 0.7826087  0.69565217 0.69565217 0.91304348
 0.91304348 0.91304348 0.90909091 0.72727273]

mean value: 0.8375494071146244

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88       0.95652174 0.7826087  0.66666667 0.66666667 0.91666667
 0.91666667 0.92307692 0.9        0.72727273]

mean value: 0.8336146751798925

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.78571429 0.91666667 0.75       0.7        0.77777778 0.91666667
 0.91666667 0.85714286 1.         0.72727273]

mean value: 0.8347907647907647

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.81818182 0.63636364 0.58333333 0.91666667
 0.91666667 1.         0.81818182 0.72727273]

mean value: 0.8416666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.875      0.95833333 0.78409091 0.69318182 0.70075758 0.91287879
 0.91287879 0.90909091 0.90909091 0.72727273]

mean value: 0.8382575757575758

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.78571429 0.91666667 0.64285714 0.5        0.5        0.84615385
 0.84615385 0.85714286 0.81818182 0.57142857]

mean value: 0.7284299034299034

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.32

Accuracy on Blind test: 0.64

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00908017 0.00915766 0.00907207 0.00912404 0.00897431 0.00912595
 0.00911784 0.00910449 0.01033187 0.00916791]

mean value: 0.00922563076019287

key: score_time
value: [0.00860286 0.00855422 0.00866795 0.00879788 0.00872707 0.00869775
 0.00873923 0.00875854 0.00952125 0.00869274]

mean value: 0.008775949478149414

key: test_mcc
value: [0.03816905 0.56490196 0.30240737 0.03178209 0.65151515 0.5164589
 0.38932432 0.74242424 0.56694671 0.36514837]

mean value: 0.416907815921681

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.52173913 0.7826087  0.65217391 0.52173913 0.82608696 0.73913043
 0.69565217 0.86956522 0.77272727 0.68181818]

mean value: 0.7063241106719368

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.47619048 0.76190476 0.6        0.42105263 0.83333333 0.7
 0.72       0.86956522 0.73684211 0.66666667]

mean value: 0.6785555192328647

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.8        0.66666667 0.5        0.83333333 0.875
 0.69230769 0.90909091 0.875      0.7       ]

mean value: 0.7351398601398601

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.45454545 0.72727273 0.54545455 0.36363636 0.83333333 0.58333333
 0.75       0.83333333 0.63636364 0.63636364]

mean value: 0.6363636363636364

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.51893939 0.78030303 0.64772727 0.51515152 0.82575758 0.74621212
 0.69318182 0.87121212 0.77272727 0.68181818]

mean value: 0.7053030303030303

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.3125     0.61538462 0.42857143 0.26666667 0.71428571 0.53846154
 0.5625     0.76923077 0.58333333 0.5       ]

mean value: 0.5290934065934066

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.56

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.33109474 1.30057764 1.29539728 1.27576232 1.28721762 1.28771162
 1.30014181 1.29339123 1.31685042 1.30856156]

mean value: 1.2996706247329712

key: score_time
value: [0.09498    0.09537101 0.09173775 0.0889883  0.09702682 0.0884161
 0.09529257 0.09618378 0.09623957 0.09434962]

mean value: 0.09385855197906494

key: test_mcc
value: [0.65909298 0.91666667 0.56818182 0.74047959 0.66414149 0.91666667
 0.91605722 0.74242424 0.81818182 0.73029674]

mean value: 0.7672189240726363

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.82608696 0.95652174 0.7826087  0.86956522 0.82608696 0.95652174
 0.95652174 0.86956522 0.90909091 0.86363636]

mean value: 0.8816205533596838

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        0.95652174 0.7826087  0.85714286 0.81818182 0.95652174
 0.96       0.86956522 0.90909091 0.85714286]

mean value: 0.876677583286279

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88888889 0.91666667 0.75       0.9        0.9        1.
 0.92307692 0.90909091 0.90909091 0.9       ]

mean value: 0.8996814296814297

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 1.         0.81818182 0.81818182 0.75       0.91666667
 1.         0.83333333 0.90909091 0.81818182]

mean value: 0.8590909090909091

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8219697  0.95833333 0.78409091 0.86742424 0.82954545 0.95833333
 0.95454545 0.87121212 0.90909091 0.86363636]

mean value: 0.8818181818181818

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 0.91666667 0.64285714 0.75       0.69230769 0.91666667
 0.92307692 0.76923077 0.83333333 0.75      ]

mean value: 0.7860805860805861

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.3

Accuracy on Blind test: 0.63

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.93184686 0.88155961 0.97611141 0.94901252 0.89592099 0.96174765
 0.93060923 0.96457934 0.91490054 0.93486118]

mean value: 0.934114933013916

key: score_time
value: [0.24499297 0.24134612 0.18891835 0.13461995 0.24500108 0.2090826
 0.13763881 0.20907116 0.14146399 0.24355125]

mean value: 0.19956862926483154

key: test_mcc
value: [0.65909298 0.76764947 0.58930667 0.65909298 0.74242424 0.83971912
 0.82575758 0.83971912 0.73029674 0.54772256]

mean value: 0.7200781469442072

key: train_mcc
value: [0.9707786  0.94163576 0.97114302 0.961154   0.96116136 0.9707786
 0.95163291 0.96116136 0.95150116 0.95186015]

mean value: 0.9592806896568655

key: test_accuracy
value: [0.82608696 0.86956522 0.7826087  0.82608696 0.86956522 0.91304348
 0.91304348 0.91304348 0.86363636 0.77272727]

mean value: 0.8549407114624505

key: train_accuracy
value: [0.98536585 0.97073171 0.98536585 0.9804878  0.9804878  0.98536585
 0.97560976 0.9804878  0.97572816 0.97572816]

mean value: 0.9795358749704002

key: test_fscore
value: [0.8        0.88       0.8        0.8        0.86956522 0.90909091
 0.91666667 0.90909091 0.86956522 0.7826087 ]

mean value: 0.8536587615283268

key: train_fscore
value: [0.98536585 0.97115385 0.98564593 0.98076923 0.98058252 0.98536585
 0.97584541 0.98058252 0.97584541 0.97607656]

mean value: 0.9797233142078156

key: test_precision
value: [0.88888889 0.78571429 0.71428571 0.88888889 0.90909091 1.
 0.91666667 1.         0.83333333 0.75      ]

mean value: 0.8686868686868687

key: train_precision /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

value: [0.99019608 0.96190476 0.97169811 0.97142857 0.97115385 0.98058252
 0.96190476 0.97115385 0.97115385 0.96226415]

mean value: 0.9713440500553794

key: test_recall
value: [0.72727273 1.         0.90909091 0.72727273 0.83333333 0.83333333
 0.91666667 0.83333333 0.90909091 0.81818182]

mean value: 0.8507575757575758

key: train_recall
value: [0.98058252 0.98058252 1.         0.99029126 0.99019608 0.99019608
 0.99019608 0.99019608 0.98058252 0.99029126]

mean value: 0.9883114410812869

key: test_roc_auc
value: [0.8219697  0.875      0.78787879 0.8219697  0.87121212 0.91666667
 0.91287879 0.91666667 0.86363636 0.77272727]

mean value: 0.8560606060606061

key: train_roc_auc
value: [0.9853893  0.97068342 0.98529412 0.98043975 0.98053493 0.9853893
 0.97568056 0.98053493 0.97572816 0.97572816]

mean value: 0.9795402627070245

key: test_jcc
value: [0.66666667 0.78571429 0.66666667 0.66666667 0.76923077 0.83333333
 0.84615385 0.83333333 0.76923077 0.64285714]

mean value: 0.747985347985348

key: train_jcc
value: [0.97115385 0.94392523 0.97169811 0.96226415 0.96190476 0.97115385
 0.95283019 0.96190476 0.95283019 0.95327103]

mean value: 0.9602936119308894

MCC on Blind test: 0.38

Accuracy on Blind test: 0.67

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02315831 0.00965405 0.00990772 0.00992966 0.00958633 0.00925899
 0.00968218 0.00935221 0.00954723 0.00899029]

mean value: 0.010906696319580078

key: score_time
value: [0.010185   0.0090487  0.00989842 0.00874281 0.00877428 0.00885487
 0.00888062 0.00934005 0.00959086 0.009305  ]

mean value: 0.00926206111907959

key: test_mcc
value: [0.58002308 0.12878788 0.12336594 0.21452908 0.39393939 0.39393939
 0.05427825 0.39393939 0.18257419 0.20412415]

mean value: 0.26695007377892416

key: train_mcc
value: [0.43994849 0.50824626 0.49637007 0.45056913 0.46832513 0.45757548
 0.46948042 0.48928361 0.48018451 0.46191786]

mean value: 0.47219009507999327

key: test_accuracy
value: [0.7826087  0.56521739 0.56521739 0.60869565 0.69565217 0.69565217
 0.52173913 0.69565217 0.59090909 0.59090909]

mean value: 0.6312252964426878

key: train_accuracy
value: [0.71707317 0.75121951 0.74634146 0.72195122 0.72682927 0.72682927
 0.73170732 0.74146341 0.73786408 0.72815534]

mean value: 0.7329434051622069

key: test_fscore
value: [0.73684211 0.54545455 0.44444444 0.52631579 0.69565217 0.69565217
 0.47619048 0.69565217 0.57142857 0.47058824]

mean value: 0.5858220689288127

key: train_fscore
value: [0.69473684 0.73298429 0.73195876 0.6984127  0.68539326 0.70526316
 0.70588235 0.71657754 0.71875    0.70526316]

mean value: 0.7095222063862845

key: test_precision
value: [0.875      0.54545455 0.57142857 0.625      0.72727273 0.72727273
 0.55555556 0.72727273 0.6        0.66666667]

mean value: 0.6620923520923521

key: train_precision
value: [0.75862069 0.79545455 0.78021978 0.76744186 0.80263158 0.76136364
 0.77647059 0.78823529 0.7752809  0.77011494]

mean value: 0.7775833814863701

key: test_recall
value: [0.63636364 0.54545455 0.36363636 0.45454545 0.66666667 0.66666667
 0.41666667 0.66666667 0.54545455 0.36363636]

mean value: 0.5325757575757576

key: train_recall
value: [0.6407767  0.67961165 0.68932039 0.6407767  0.59803922 0.65686275
 0.64705882 0.65686275 0.66990291 0.65048544]

mean value: 0.6529697315819532

key: test_roc_auc
value: [0.77651515 0.56439394 0.55681818 0.60227273 0.6969697  0.6969697
 0.52651515 0.6969697  0.59090909 0.59090909]

mean value: 0.6299242424242424

key: train_roc_auc
value: [0.71744717 0.75157053 0.74662098 0.72234913 0.72620407 0.72648962
 0.7312964  0.74105273 0.73786408 0.72815534]

mean value: 0.7329050066628594

key: test_jcc
value: [0.58333333 0.375      0.28571429 0.35714286 0.53333333 0.53333333
 0.3125     0.53333333 0.4        0.30769231]

mean value: 0.4221382783882784

key: train_jcc
value: [0.53225806 0.5785124  0.57723577 0.53658537 0.52136752 0.54471545
 0.54545455 0.55833333 0.56097561 0.54471545]

mean value: 0.5500153503642167

MCC on Blind test: 0.26

Accuracy on Blind test: 0.63

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.07360411 0.05273223 0.05829406 0.06245947 0.0599854  0.05718923
 0.06659317 0.06024432 0.06122375 0.06588459]

mean value: 0.06182103157043457

key: score_time
value: [0.01040006 0.01061964 0.01049256 0.01047063 0.01048827 0.01023436
 0.01153612 0.01126742 0.0113802  0.01140666]

mean value: 0.010829591751098632

key: test_mcc
value: [0.91666667 0.91666667 0.74242424 1.         0.83971912 0.83971912
 1.         1.         0.91287093 1.        ]

mean value: 0.9168066750452115

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95652174 0.95652174 0.86956522 1.         0.91304348 0.91304348
 1.         1.         0.95454545 1.        ]

mean value: 0.9563241106719368

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95652174 0.95652174 0.86956522 1.         0.90909091 0.90909091
 1.         1.         0.95652174 1.        ]

mean value: 0.9557312252964427

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91666667 0.91666667 0.83333333 1.         1.         1.
 1.         1.         0.91666667 1.        ]

mean value: 0.9583333333333334

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.90909091 1.         0.83333333 0.83333333
 1.         1.         1.         1.        ]

mean value: 0.9575757575757575

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95833333 0.95833333 0.87121212 1.         0.91666667 0.91666667
 1.         1.         0.95454545 1.        ]

mean value: 0.9575757575757575

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.91666667 0.91666667 0.76923077 1.         0.83333333 0.83333333
 1.         1.         0.91666667 1.        ]

mean value: 0.9185897435897435

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.53

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.03170872 0.05358624 0.06331277 0.06383777 0.07470703 0.05675173
 0.0568521  0.05640078 0.05648899 0.05701303]

mean value: 0.057065916061401364

key: score_time
value: [0.02100611 0.0222826  0.02220082 0.02002335 0.02134395 0.02366328
 0.02332687 0.01963353 0.02289724 0.02070689]

mean value: 0.021708464622497557

key: test_mcc
value: [0.76277007 0.56490196 0.58930667 0.48075018 0.6992059  0.82575758
 0.56490196 0.58930667 0.91287093 0.56694671]

mean value: 0.6556718603789258

key: train_mcc
value: [0.92211753 0.9707786  0.94146202 0.92211753 0.94164684 0.93175328
 0.95163291 0.95126594 0.91266437 0.94192516]

mean value: 0.938736418639331

key: test_accuracy
value: [0.86956522 0.7826087  0.7826087  0.73913043 0.82608696 0.91304348
 0.7826087  0.7826087  0.95454545 0.77272727]

mean value: 0.8205533596837945

key: train_accuracy
value: [0.96097561 0.98536585 0.97073171 0.96097561 0.97073171 0.96585366
 0.97560976 0.97560976 0.95631068 0.97087379]

mean value: 0.9693038124556003

key: test_fscore
value: [0.84210526 0.76190476 0.8        0.7        0.8        0.91666667
 0.8        0.76190476 0.95238095 0.73684211]

mean value: 0.8071804511278196

key: train_fscore
value: [0.96153846 0.98536585 0.97087379 0.96153846 0.97087379 0.96585366
 0.97584541 0.97560976 0.95652174 0.97115385]

mean value: 0.969517476009744

key: test_precision
value: [1.         0.8        0.71428571 0.77777778 1.         0.91666667
 0.76923077 0.88888889 1.         0.875     ]

mean value: 0.8741849816849817

key: train_precision
value: [0.95238095 0.99019608 0.97087379 0.95238095 0.96153846 0.96116505
 0.96190476 0.97087379 0.95192308 0.96190476]

mean value: 0.9635141666823563

key: test_recall
value: [0.72727273 0.72727273 0.90909091 0.63636364 0.66666667 0.91666667
 0.83333333 0.66666667 0.90909091 0.63636364]

mean value: 0.7628787878787878

key: train_recall
value: [0.97087379 0.98058252 0.97087379 0.97087379 0.98039216 0.97058824
 0.99019608 0.98039216 0.96116505 0.98058252]

mean value: 0.975652008376166

key: test_roc_auc
value: [0.86363636 0.78030303 0.78787879 0.73484848 0.83333333 0.91287879
 0.78030303 0.78787879 0.95454545 0.77272727]

mean value: 0.8208333333333333

key: train_roc_auc
value: [0.96092709 0.9853893  0.97073101 0.96092709 0.9707786  0.96587664
 0.97568056 0.97563297 0.95631068 0.97087379]

mean value: 0.9693127736531506

key: test_jcc
value: [0.72727273 0.61538462 0.66666667 0.53846154 0.66666667 0.84615385
 0.66666667 0.61538462 0.90909091 0.58333333]

mean value: 0.6835081585081585

key: train_jcc
value: [0.92592593 0.97115385 0.94339623 0.92592593 0.94339623 0.93396226
 0.95283019 0.95238095 0.91666667 0.94392523]

mean value: 0.9409563456358554

MCC on Blind test: 0.13

Accuracy on Blind test: 0.56

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.0219748  0.01021671 0.01002097 0.009799   0.00974965 0.00970197
 0.00891066 0.00987864 0.00947976 0.00913072]

mean value: 0.010886287689208985

key: score_time
value: [0.01003003 0.00961876 0.00944829 0.00925136 0.00877619 0.00935459
 0.00935292 0.00873876 0.00937414 0.00935054]

mean value: 0.009329557418823242

key: test_mcc
value: [0.47727273 0.39393939 0.39393939 0.38932432 0.30240737 0.56818182
 0.56490196 0.02585438 0.09759001 0.09245003]

mean value: 0.3305861402074863

key: train_mcc
value: [0.41481375 0.48545031 0.46581391 0.4461775  0.40046964 0.42940367
 0.47412116 0.45056913 0.44098577 0.50679276]

mean value: 0.4514597602525188

key: test_accuracy
value: [0.73913043 0.69565217 0.69565217 0.69565217 0.65217391 0.7826087
 0.7826087  0.52173913 0.54545455 0.54545455]

mean value: 0.6656126482213438

key: train_accuracy
value: [0.70243902 0.74146341 0.73170732 0.72195122 0.69756098 0.71219512
 0.73658537 0.72195122 0.7184466  0.75242718]

mean value: 0.7236727444944352

key: test_fscore
value: [0.72727273 0.69565217 0.69565217 0.66666667 0.69230769 0.7826087
 0.8        0.62068966 0.61538462 0.58333333]

mean value: 0.687956773361571

key: train_fscore
value: [0.73362445 0.75576037 0.74654378 0.73732719 0.71818182 0.73059361
 0.74285714 0.74208145 0.73636364 0.7627907 ]

mean value: 0.7406124140900755

key: test_precision
value: [0.72727273 0.66666667 0.66666667 0.7        0.64285714 0.81818182
 0.76923077 0.52941176 0.53333333 0.53846154]

mean value: 0.6592082427376545

key: train_precision
value: [0.66666667 0.71929825 0.71052632 0.70175439 0.66949153 0.68376068
 0.72222222 0.68907563 0.69230769 0.73214286]

mean value: 0.6987246225144372

key: test_recall
value: [0.72727273 0.72727273 0.72727273 0.63636364 0.75       0.75
 0.83333333 0.75       0.72727273 0.63636364]

mean value: 0.7265151515151516

key: train_recall
value: [0.81553398 0.7961165  0.78640777 0.77669903 0.7745098  0.78431373
 0.76470588 0.80392157 0.78640777 0.7961165 ]

mean value: 0.7884732533790215

key: test_roc_auc
value: [0.73863636 0.6969697  0.6969697  0.69318182 0.64772727 0.78409091
 0.78030303 0.51136364 0.54545455 0.54545455]

mean value: 0.6640151515151516

key: train_roc_auc
value: [0.70188464 0.74119551 0.73143918 0.72168285 0.69793451 0.71254521
 0.73672187 0.72234913 0.7184466  0.75242718]

mean value: 0.7236626689510756

key: test_jcc
value: [0.57142857 0.53333333 0.53333333 0.5        0.52941176 0.64285714
 0.66666667 0.45       0.44444444 0.41176471]

mean value: 0.5283239962651727

key: train_jcc
value: [0.57931034 0.60740741 0.59558824 0.58394161 0.56028369 0.57553957
 0.59090909 0.58992806 0.58273381 0.61654135]

mean value: 0.5882183164453261

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01108789 0.0161562  0.01473117 0.01652384 0.01481438 0.015769
 0.01692629 0.01509166 0.01592302 0.01968622]

mean value: 0.015670967102050782

key: score_time
value: [0.00860429 0.01099896 0.01096487 0.01157355 0.01150799 0.01149702
 0.011621   0.01159811 0.01157045 0.01161623]

mean value: 0.011155247688293457

key: test_mcc
value: [0.62050523 0.66414149 0.48856385 0.69084928 0.83971912 0.63327851
 0.74047959 0.56490196 0.48795004 0.40824829]

mean value: 0.6138637350421967

key: train_mcc
value: [0.64013725 0.94146202 0.961154   0.79610703 0.88361919 0.72360351
 0.88909823 0.82136935 0.71743005 0.88083033]

mean value: 0.8254810956558689

key: test_accuracy
value: [0.7826087  0.82608696 0.73913043 0.82608696 0.91304348 0.7826087
 0.86956522 0.7826087  0.72727273 0.68181818]

mean value: 0.7930830039525691

key: train_accuracy
value: [0.7902439  0.97073171 0.9804878  0.88780488 0.94146341 0.84390244
 0.94146341 0.90731707 0.83980583 0.9368932 ]

mean value: 0.9040113663272555

key: test_fscore
value: [0.70588235 0.83333333 0.75       0.77777778 0.90909091 0.73684211
 0.88       0.8        0.76923077 0.74074074]

mean value: 0.7902897988377864

key: train_fscore
value: [0.73619632 0.97087379 0.98076923 0.87431694 0.94230769 0.81395349
 0.94444444 0.9124424  0.86192469 0.94063927]

mean value: 0.8977868253122568

key: test_precision
value: [1.         0.76923077 0.69230769 1.         1.         1.
 0.84615385 0.76923077 0.66666667 0.625     ]

mean value: 0.8368589743589744

key: train_precision
value: [1.         0.97087379 0.97142857 1.         0.9245283  1.
 0.89473684 0.86086957 0.75735294 0.88793103]

mean value: 0.9267721042705015

key: test_recall
value: [0.54545455 0.90909091 0.81818182 0.63636364 0.83333333 0.58333333
 0.91666667 0.83333333 0.90909091 0.90909091]

mean value: 0.7893939393939394

key: train_recall
value: [0.58252427 0.97087379 0.99029126 0.77669903 0.96078431 0.68627451
 1.         0.97058824 1.         1.        ]

mean value: 0.8938035408338092

key: test_roc_auc
value: [0.77272727 0.82954545 0.74242424 0.81818182 0.91666667 0.79166667
 0.86742424 0.78030303 0.72727273 0.68181818]

mean value: 0.7928030303030303

key: train_roc_auc
value: [0.79126214 0.97073101 0.98043975 0.88834951 0.94155721 0.84313725
 0.94174757 0.90762421 0.83980583 0.9368932 ]

mean value: 0.904154768703598

key: test_jcc
value: [0.54545455 0.71428571 0.6        0.63636364 0.83333333 0.58333333
 0.78571429 0.66666667 0.625      0.58823529]

mean value: 0.6578386809269162

key: train_jcc
value: [0.58252427 0.94339623 0.96226415 0.77669903 0.89090909 0.68627451
 0.89473684 0.83898305 0.75735294 0.88793103]

mean value: 0.8221071147654326

MCC on Blind test: 0.33

Accuracy on Blind test: 0.63

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01452732 0.01389766 0.01486945 0.0147233  0.01471424 0.01462889
 0.0140667  0.01539159 0.01441121 0.01535964]

mean value: 0.0146589994430542

key: score_time
value: [0.01154995 0.0115397  0.01155138 0.01168466 0.01158977 0.01151705
 0.01147985 0.01160812 0.01153541 0.01155424]

mean value: 0.011561012268066407

key: test_mcc
value: [0.40451992 0.66414149 0.33371191 0.76764947 0.6992059  0.74242424
 0.65151515 0.83971912 0.68313005 0.40824829]

mean value: 0.6194265542300403

key: train_mcc
value: [0.4515346  0.84539215 0.89473501 0.72360351 0.73146795 0.88558308
 0.7922197  0.91330072 0.81319759 0.69427256]

mean value: 0.774530686441988

key: test_accuracy
value: [0.65217391 0.82608696 0.65217391 0.86956522 0.82608696 0.86956522
 0.82608696 0.91304348 0.81818182 0.68181818]

mean value: 0.7934782608695652

key: train_accuracy
value: [0.66829268 0.92195122 0.94634146 0.84390244 0.84878049 0.94146341
 0.88780488 0.95609756 0.89805825 0.82524272]

mean value: 0.873793511721525

key: test_fscore
value: [0.42857143 0.83333333 0.69230769 0.88       0.8        0.86956522
 0.83333333 0.90909091 0.84615385 0.58823529]

mean value: 0.7680591054299494

key: train_fscore
value: [0.50724638 0.92       0.94835681 0.86554622 0.82080925 0.93877551
 0.87431694 0.9569378  0.90748899 0.78823529]

mean value: 0.8527713181405282

key: test_precision
value: [1.         0.76923077 0.6        0.78571429 1.         0.90909091
 0.83333333 1.         0.73333333 0.83333333]

mean value: 0.8464035964035964

key: train_precision
value: [1.         0.94845361 0.91818182 0.76296296 1.         0.9787234
 0.98765432 0.93457944 0.83064516 1.        ]

mean value: 0.9361200715177836

key: test_recall
value: [0.27272727 0.90909091 0.81818182 1.         0.66666667 0.83333333
 0.83333333 0.83333333 1.         0.45454545]

mean value: 0.7621212121212121

key: train_recall
value: [0.33980583 0.89320388 0.98058252 1.         0.69607843 0.90196078
 0.78431373 0.98039216 1.         0.65048544]

mean value: 0.8226822767942128

key: test_roc_auc
value: [0.63636364 0.82954545 0.65909091 0.875      0.83333333 0.87121212
 0.82575758 0.91666667 0.81818182 0.68181818]

mean value: 0.7946969696969697

key: train_roc_auc
value: [0.66990291 0.92209214 0.94617362 0.84313725 0.84803922 0.94127165
 0.88730249 0.9562155  0.89805825 0.82524272]

mean value: 0.8737435750999429

key: test_jcc
value: [0.27272727 0.71428571 0.52941176 0.78571429 0.66666667 0.76923077
 0.71428571 0.83333333 0.73333333 0.41666667]

mean value: 0.6435655520949639

key: train_jcc
value: [0.33980583 0.85185185 0.90178571 0.76296296 0.69607843 0.88461538
 0.77669903 0.91743119 0.83064516 0.65048544]

mean value: 0.7612360990301472

MCC on Blind test: 0.3

Accuracy on Blind test: 0.64

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.13866186 0.11934733 0.11773157 0.11565089 0.11581945 0.117486
 0.11805749 0.12116289 0.11758924 0.11728621]

mean value: 0.11987929344177246

key: score_time
value: [0.0161581  0.01547503 0.01618123 0.01490808 0.01621389 0.01644397
 0.01605368 0.01637292 0.01620722 0.01623797]

mean value: 0.016025209426879884

key: test_mcc
value: [0.74047959 0.82575758 0.74242424 0.91605722 0.83971912 0.83971912
 1.         0.66414149 0.91287093 0.81818182]

mean value: 0.8299351113957522

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.91304348 0.86956522 0.95652174 0.91304348 0.91304348
 1.         0.82608696 0.95454545 0.90909091]

mean value: 0.9124505928853754

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.90909091 0.86956522 0.95238095 0.90909091 0.90909091
 1.         0.81818182 0.95238095 0.90909091]

mean value: 0.9086015433841521

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.90909091 0.83333333 1.         1.         1.
 1.         0.9        1.         0.90909091]

mean value: 0.9451515151515152

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.90909091 0.90909091 0.90909091 0.83333333 0.83333333
 1.         0.75       0.90909091 0.90909091]

mean value: 0.878030303030303

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 0.91287879 0.87121212 0.95454545 0.91666667 0.91666667
 1.         0.82954545 0.95454545 0.90909091]

mean value: 0.9132575757575758

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.83333333 0.76923077 0.90909091 0.83333333 0.83333333
 1.         0.69230769 0.90909091 0.83333333]

mean value: 0.8363053613053613

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.01

Accuracy on Blind test: 0.5

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04288006 0.03579044 0.03872585 0.05015779 0.04801965 0.05010223
 0.04300642 0.05224681 0.04316568 0.06040478]

mean value: 0.04644997119903564

key: score_time
value: [0.01788092 0.0261817  0.01968694 0.03280926 0.02911305 0.04006219
 0.02331567 0.02224922 0.0289259  0.03248787]

mean value: 0.027271270751953125

key: test_mcc
value: [0.58002308 1.         0.65151515 1.         0.83971912 0.83971912
 0.83971912 0.91666667 0.91287093 0.83205029]

mean value: 0.8412283485098235

key: train_mcc
value: [0.98067587 0.99029126 0.98067587 1.         0.99029034 1.
 0.99029034 0.98067223 0.99033794 0.98076744]

mean value: 0.9884001294873583

key: test_accuracy
value: [0.7826087  1.         0.82608696 1.         0.91304348 0.91304348
 0.91304348 0.95652174 0.95454545 0.90909091]

mean value: 0.916798418972332

key: train_accuracy
value: [0.9902439  0.99512195 0.9902439  1.         0.99512195 1.
 0.99512195 0.9902439  0.99514563 0.99029126]

mean value: 0.9941534454179494

key: test_fscore
value: [0.73684211 1.         0.81818182 1.         0.90909091 0.90909091
 0.90909091 0.95652174 0.95238095 0.9       ]

mean value: 0.909119934222909

key: train_fscore
value: [0.99019608 0.99512195 0.99019608 1.         0.99507389 1.
 0.99507389 0.99009901 0.99512195 0.99019608]

mean value: 0.9941078930885363

key: test_precision
value: [0.875      1.         0.81818182 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9693181818181819

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 1.         0.81818182 1.         0.83333333 0.83333333
 0.83333333 0.91666667 0.90909091 0.81818182]

mean value: 0.8598484848484849

key: train_recall
value: [0.98058252 0.99029126 0.98058252 1.         0.99019608 1.
 0.99019608 0.98039216 0.99029126 0.98058252]

mean value: 0.9883114410812869

key: test_roc_auc
value: [0.77651515 1.         0.82575758 1.         0.91666667 0.91666667
 0.91666667 0.95833333 0.95454545 0.90909091]

mean value: 0.9174242424242425

key: train_roc_auc
value: [0.99029126 0.99514563 0.99029126 1.         0.99509804 1.
 0.99509804 0.99019608 0.99514563 0.99029126]

mean value: 0.9941557205406435

key: test_jcc
value: [0.58333333 1.         0.69230769 1.         0.83333333 0.83333333
 0.83333333 0.91666667 0.90909091 0.81818182]

mean value: 0.8419580419580419

key: train_jcc
value: [0.98058252 0.99029126 0.98058252 1.         0.99019608 1.
 0.99019608 0.98039216 0.99029126 0.98058252]

mean value: 0.9883114410812869

MCC on Blind test: 0.05

Accuracy on Blind test: 0.52

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.02399015 0.02728724 0.03085113 0.06025457 0.07673955 0.06412292
 0.06474566 0.06357622 0.0647521  0.06365848]

mean value: 0.053997802734375

key: score_time
value: [0.0126431  0.0125792  0.01255274 0.02063203 0.02428436 0.02345872
 0.02092385 0.0228548  0.02434254 0.02310681]

mean value: 0.019737815856933592

key: test_mcc
value: [0.38932432 0.47727273 0.21452908 0.30240737 0.66414149 0.76764947
 0.5164589  0.74047959 0.54232614 0.2773501 ]

mean value: 0.489193919560136

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.69565217 0.73913043 0.60869565 0.65217391 0.82608696 0.86956522
 0.73913043 0.86956522 0.72727273 0.63636364]

mean value: 0.7363636363636363

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.72727273 0.52631579 0.6        0.81818182 0.85714286
 0.7        0.88       0.625      0.6       ]

mean value: 0.7000579858737753

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        0.72727273 0.625      0.66666667 0.9        1.
 0.875      0.84615385 1.         0.66666667]

mean value: 0.8006759906759907

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 0.72727273 0.45454545 0.54545455 0.75       0.75
 0.58333333 0.91666667 0.45454545 0.54545455]

mean value: 0.6363636363636364

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.69318182 0.73863636 0.60227273 0.64772727 0.82954545 0.875
 0.74621212 0.86742424 0.72727273 0.63636364]

mean value: 0.7363636363636363

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.57142857 0.35714286 0.42857143 0.69230769 0.75
 0.53846154 0.78571429 0.45454545 0.42857143]

mean value: 0.5506743256743256

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.22

Accuracy on Blind test: 0.61

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.36353517 0.35397005 0.34888005 0.34964776 0.35780954 0.35491943
 0.35081267 0.35545444 0.35240722 0.34948468]

mean value: 0.35369210243225097

key: score_time
value: [0.00919867 0.0091424  0.00909376 0.00912738 0.00930524 0.00908327
 0.0090704  0.00978851 0.00921178 0.00905943]

mean value: 0.009208083152770996

key: test_mcc
value: [0.74047959 0.91666667 0.74242424 1.         0.76764947 1.
 1.         0.91666667 0.91287093 1.        ]

mean value: 0.8996757568581448

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.95652174 0.86956522 1.         0.86956522 1.
 1.         0.95652174 0.95454545 1.        ]

mean value: 0.9476284584980237

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.95652174 0.86956522 1.         0.85714286 1.
 1.         0.95652174 0.95652174 1.        ]

mean value: 0.9453416149068323

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.91666667 0.83333333 1.         1.         1.
 1.         1.         0.91666667 1.        ]

mean value: 0.9566666666666667

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.90909091 1.         0.75       1.
 1.         0.91666667 1.         1.        ]

mean value: 0.9393939393939394

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 0.95833333 0.87121212 1.         0.875      1.
 1.         0.95833333 0.95454545 1.        ]

mean value: 0.9484848484848485

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.91666667 0.76923077 1.         0.75       1.
 1.         0.91666667 0.91666667 1.        ]

mean value: 0.9019230769230769

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.54

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.02362514 0.0202353  0.0199132  0.02025294 0.0201211  0.02020693
 0.02033591 0.02034235 0.01992464 0.02041364]

mean value: 0.02053711414337158

key: score_time
value: [0.01706982 0.01200247 0.01434779 0.01835537 0.017483   0.02034879
 0.0230217  0.01707029 0.01792383 0.01838613]

mean value: 0.01760091781616211

key: test_mcc
value: [0.56879646 0.6992059  0.37080992 0.50460839 0.76277007 0.76277007
 0.69084928 0.83743579 0.64715023 0.75592895]

mean value: 0.6600325061286204

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73913043 0.82608696 0.60869565 0.69565217 0.86956522 0.86956522
 0.82608696 0.91304348 0.81818182 0.86363636]

mean value: 0.8029644268774704

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.78571429 0.84615385 0.70967742 0.75862069 0.88888889 0.88888889
 0.85714286 0.92307692 0.83333333 0.88      ]

mean value: 0.8371497132209035

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.64705882 0.73333333 0.55       0.61111111 0.8        0.8
 0.75       0.85714286 0.76923077 0.78571429]

mean value: 0.7303591180061768

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.90909091 1.        ]

mean value: 0.990909090909091

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.83333333 0.625      0.70833333 0.86363636 0.86363636
 0.81818182 0.90909091 0.81818182 0.86363636]

mean value: 0.8053030303030303

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.64705882 0.73333333 0.55       0.61111111 0.8        0.8
 0.75       0.85714286 0.71428571 0.78571429]

mean value: 0.7248646125116713

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.18

Accuracy on Blind test: 0.54

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02244925 0.04062223 0.03517199 0.03477359 0.05889964 0.01386142
 0.01386476 0.01378965 0.0137496  0.03409195]

mean value: 0.028127408027648924

key: score_time
value: [0.02053857 0.02378178 0.02394128 0.02323008 0.01209402 0.01185203
 0.01175451 0.01184201 0.01170516 0.02339268]

mean value: 0.01741321086883545

key: test_mcc
value: [0.82575758 0.66414149 0.47727273 0.65151515 0.76764947 0.91666667
 0.74047959 0.82575758 0.81818182 0.36514837]

mean value: 0.7052570438471021

key: train_mcc
value: [0.90310636 0.90310636 0.91325992 0.92194936 0.93211467 0.92213232
 0.93211467 0.86409538 0.91266437 0.92389898]

mean value: 0.9128442392047932

key: test_accuracy
value: [0.91304348 0.82608696 0.73913043 0.82608696 0.86956522 0.95652174
 0.86956522 0.91304348 0.90909091 0.68181818]

mean value: 0.850395256916996

key: train_accuracy
value: [0.95121951 0.95121951 0.95609756 0.96097561 0.96585366 0.96097561
 0.96585366 0.93170732 0.95631068 0.96116505]

mean value: 0.956137816717973

key: test_fscore
value: [0.90909091 0.83333333 0.72727273 0.81818182 0.85714286 0.95652174
 0.88       0.91666667 0.90909091 0.66666667]

mean value: 0.8473967626576322

key: train_fscore
value: [0.95238095 0.95238095 0.95734597 0.96116505 0.96618357 0.96116505
 0.96618357 0.93269231 0.95652174 0.96226415]

mean value: 0.9568283320937857

key: test_precision
value: [0.90909091 0.76923077 0.72727273 0.81818182 1.         1.
 0.84615385 0.91666667 0.90909091 0.7       ]

mean value: 0.8595687645687645

key: train_precision
value: [0.93457944 0.93457944 0.93518519 0.96116505 0.95238095 0.95192308
 0.95238095 0.91509434 0.95192308 0.93577982]

mean value: 0.942499132697801

key: test_recall
value: [0.90909091 0.90909091 0.72727273 0.81818182 0.75       0.91666667
 0.91666667 0.91666667 0.90909091 0.63636364]

mean value: 0.8409090909090909

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:155: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:158: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.97087379 0.97087379 0.98058252 0.96116505 0.98039216 0.97058824
 0.98039216 0.95098039 0.96116505 0.99029126]

mean value: 0.971730439748715

key: test_roc_auc
value: [0.91287879 0.82954545 0.73863636 0.82575758 0.875      0.95833333
 0.86742424 0.91287879 0.90909091 0.68181818]

mean value: 0.8511363636363636

key: train_roc_auc
value: [0.95112317 0.95112317 0.95597754 0.96097468 0.96592423 0.96102227
 0.96592423 0.93180088 0.95631068 0.96116505]

mean value: 0.9561345897582334

key: test_jcc
value: [0.83333333 0.71428571 0.57142857 0.69230769 0.75       0.91666667
 0.78571429 0.84615385 0.83333333 0.5       ]

mean value: 0.7443223443223443

key: train_jcc
value: [0.90909091 0.90909091 0.91818182 0.92523364 0.93457944 0.92523364
 0.93457944 0.87387387 0.91666667 0.92727273]

mean value: 0.9173803072401203

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.23017311 0.2311604  0.22449851 0.22934937 0.22811341 0.22492957
 0.3225956  0.27365017 0.24024272 0.22975492]

mean value: 0.24344677925109864

key: score_time
value: [0.02268362 0.0237155  0.02395248 0.02225494 0.02246737 0.02187586
 0.02325249 0.02057958 0.02395296 0.02296972]

mean value: 0.022770452499389648

key: test_mcc
value: [0.76277007 0.56818182 0.47727273 0.65151515 0.76764947 0.82575758
 0.74047959 0.82575758 0.83205029 0.36514837]

mean value: 0.6816582649537872

key: train_mcc
value: [0.91223227 0.92211753 0.91325992 0.92194936 0.93211467 0.94164684
 0.93211467 0.86409538 0.92250402 0.92389898]

mean value: 0.9185933650622469

key: test_accuracy
value: [0.86956522 0.7826087  0.73913043 0.82608696 0.86956522 0.91304348
 0.86956522 0.91304348 0.90909091 0.68181818]

mean value: 0.8373517786561264

key: train_accuracy
value: [0.95609756 0.96097561 0.95609756 0.96097561 0.96585366 0.97073171
 0.96585366 0.93170732 0.96116505 0.96116505]

mean value: 0.9590622780014207

key: test_fscore
value: [0.84210526 0.7826087  0.72727273 0.81818182 0.85714286 0.91666667
 0.88       0.91666667 0.9        0.66666667]

mean value: 0.8307311361407471

key: train_fscore
value: [0.95652174 0.96153846 0.95734597 0.96116505 0.96618357 0.97087379
 0.96618357 0.93269231 0.96153846 0.96226415]

mean value: 0.9596307077116953

key: test_precision
value: [1.         0.75       0.72727273 0.81818182 1.         0.91666667
 0.84615385 0.91666667 1.         0.7       ]

mean value: 0.8674941724941725

key: train_precision
value: [0.95192308 0.95238095 0.93518519 0.96116505 0.95238095 0.96153846
 0.95238095 0.91509434 0.95238095 0.93577982]

mean value: 0.9470209737850626

key: test_recall
value: [0.72727273 0.81818182 0.72727273 0.81818182 0.75       0.91666667
 0.91666667 0.91666667 0.81818182 0.63636364]

mean value: 0.8045454545454546

key: train_recall
value: [0.96116505 0.97087379 0.98058252 0.96116505 0.98039216 0.98039216
 0.98039216 0.95098039 0.97087379 0.99029126]

mean value: 0.9727108319055777

key: test_roc_auc
value: [0.86363636 0.78409091 0.73863636 0.82575758 0.875      0.91287879
 0.86742424 0.91287879 0.90909091 0.68181818]

mean value: 0.8371212121212122

key: train_roc_auc
value: [0.95607272 0.96092709 0.95597754 0.96097468 0.96592423 0.9707786
 0.96592423 0.93180088 0.96116505 0.96116505]

mean value: 0.9590710070435942

key: test_jcc
value: [0.72727273 0.64285714 0.57142857 0.69230769 0.75       0.84615385
 0.78571429 0.84615385 0.81818182 0.5       ]

mean value: 0.718006993006993

key: train_jcc
value: [0.91666667 0.92592593 0.91818182 0.92523364 0.93457944 0.94339623
 0.93457944 0.87387387 0.92592593 0.92727273]

mean value: 0.9225635687626518

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02333498 0.02825093 0.02866554 0.02478814 0.03009248 0.02759218
 0.02531862 0.02699327 0.02669168 0.0287993 ]

mean value: 0.027052712440490723

key: score_time
value: [0.00992012 0.01171041 0.01177144 0.01170754 0.01182556 0.01178455
 0.01177168 0.01180267 0.01176238 0.01171088]

mean value: 0.0115767240524292

key: test_mcc
value: [ 0.37796447  0.49099025  0.74535599  0.57735027  0.28867513  0.42857143
  0.8660254   0.17407766 -0.31622777  0.28867513]

mean value: 0.39214579792141185

key: train_mcc
value: [0.90550595 0.81271824 0.78163175 0.81289702 0.83066386 0.86200967
 0.79775192 0.85947992 0.8603207  0.875     ]

mean value: 0.8397979034970979

key: test_accuracy
value: [0.66666667 0.73333333 0.85714286 0.78571429 0.64285714 0.71428571
 0.92857143 0.57142857 0.35714286 0.64285714]

mean value: 0.69

key: train_accuracy
value: [0.95275591 0.90551181 0.890625   0.90625    0.9140625  0.9296875
 0.8984375  0.9296875  0.9296875  0.9375    ]

mean value: 0.9194205216535433

key: test_fscore
value: [0.70588235 0.71428571 0.875      0.76923077 0.61538462 0.71428571
 0.92307692 0.66666667 0.18181818 0.61538462]

mean value: 0.6781015553074377

key: train_fscore
value: [0.953125   0.90769231 0.89230769 0.90769231 0.91729323 0.93233083
 0.896      0.93023256 0.93129771 0.9375    ]

mean value: 0.9205471635905882

key: test_precision
value: [0.6        0.83333333 0.77777778 0.83333333 0.66666667 0.71428571
 1.         0.54545455 0.25       0.66666667]

mean value: 0.6887518037518038

key: train_precision
value: [0.953125   0.88059701 0.87878788 0.89393939 0.88405797 0.89855072
 0.91803279 0.92307692 0.91044776 0.9375    ]

mean value: 0.9078115454461019

key: test_recall
value: [0.85714286 0.625      1.         0.71428571 0.57142857 0.71428571
 0.85714286 0.85714286 0.14285714 0.57142857]

mean value: 0.6910714285714286

key: train_recall
value: [0.953125   0.93650794 0.90625    0.921875   0.953125   0.96875
 0.875      0.9375     0.953125   0.9375    ]

mean value: 0.9342757936507936

key: test_roc_auc
value: [0.67857143 0.74107143 0.85714286 0.78571429 0.64285714 0.71428571
 0.92857143 0.57142857 0.35714286 0.64285714]

mean value: 0.6919642857142857

key: train_roc_auc
value: [0.95275298 0.90575397 0.890625   0.90625    0.9140625  0.9296875
 0.8984375  0.9296875  0.9296875  0.9375    ]

mean value: 0.9194444444444444

key: test_jcc
value: [0.54545455 0.55555556 0.77777778 0.625      0.44444444 0.55555556
 0.85714286 0.5        0.1        0.44444444]

mean value: 0.540537518037518

key: train_jcc
value: [0.91044776 0.83098592 0.80555556 0.83098592 0.84722222 0.87323944
 0.8115942  0.86956522 0.87142857 0.88235294]

mean value: 0.8533377739472339

MCC on Blind test: 0.39

Accuracy on Blind test: 0.69

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.85289168 0.70059133 0.73125625 0.85352159 0.73122883 0.70029855
 0.80738592 0.64156055 0.63386559 0.79457211]

mean value: 0.7447172403335571

key: score_time
value: [0.01466966 0.01212597 0.01516724 0.01518154 0.01209664 0.01497483
 0.01526499 0.01661897 0.01539016 0.01516986]

mean value: 0.014665985107421875

key: test_mcc
value: [ 0.21821789  0.33928571  0.57735027  0.8660254   0.42857143  0.57735027
  0.74535599  0.42857143 -0.14285714  0.1490712 ]

mean value: 0.4186942451971027

key: train_mcc
value: [1.         0.93748452 0.90669283 1.         0.89073374 1.
 1.         1.         1.         1.        ]

mean value: 0.9734911092040202

key: test_accuracy
value: [0.6        0.66666667 0.78571429 0.92857143 0.71428571 0.78571429
 0.85714286 0.71428571 0.42857143 0.57142857]

mean value: 0.7052380952380952

key: train_accuracy
value: [1.         0.96850394 0.953125   1.         0.9453125  1.
 1.         1.         1.         1.        ]

mean value: 0.9866941437007875

key: test_fscore
value: [0.625      0.66666667 0.8        0.92307692 0.71428571 0.76923077
 0.83333333 0.71428571 0.42857143 0.5       ]

mean value: 0.6974450549450549

key: train_fscore
value: [1.         0.96875    0.95384615 1.         0.94573643 1.
 1.         1.         1.         1.        ]

mean value: 0.9868332587954681

key: test_precision
value: [0.55555556 0.71428571 0.75       1.         0.71428571 0.83333333
 1.         0.71428571 0.42857143 0.6       ]

mean value: 0.731031746031746

key: train_precision
value: [1.         0.95384615 0.93939394 1.         0.93846154 1.
 1.         1.         1.         1.        ]

mean value: 0.9831701631701631

key: test_recall
value: [0.71428571 0.625      0.85714286 0.85714286 0.71428571 0.71428571
 0.71428571 0.71428571 0.42857143 0.42857143]

mean value: 0.6767857142857143

key: train_recall
value: [1.         0.98412698 0.96875    1.         0.953125   1.
 1.         1.         1.         1.        ]

mean value: 0.9906001984126984

key: test_roc_auc
value: [0.60714286 0.66964286 0.78571429 0.92857143 0.71428571 0.78571429
 0.85714286 0.71428571 0.42857143 0.57142857]

mean value: 0.70625

key: train_roc_auc
value: [1.         0.96862599 0.953125   1.         0.9453125  1.
 1.         1.         1.         1.        ]

mean value: 0.9867063492063493

key: test_jcc
value: [0.45454545 0.5        0.66666667 0.85714286 0.55555556 0.625
 0.71428571 0.55555556 0.27272727 0.33333333]

mean value: 0.553481240981241

key: train_jcc
value: [1.         0.93939394 0.91176471 1.         0.89705882 1.
 1.         1.         1.         1.        ]

mean value: 0.9748217468805704

MCC on Blind test: 0.26

Accuracy on Blind test: 0.63

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01235104 0.01005244 0.01008081 0.00952315 0.00941539 0.00865269
 0.00883651 0.00973344 0.00860357 0.00904226]

mean value: 0.009629130363464355

key: score_time
value: [0.01814413 0.00991583 0.00927973 0.00927973 0.0089798  0.00896645
 0.00927234 0.00917888 0.00874782 0.00863695]

mean value: 0.01004016399383545

key: test_mcc
value: [ 0.26189246  0.18898224  0.17407766  0.40824829  0.17407766  0.31622777
  0.1490712   0.         -0.2773501   0.31622777]

mean value: 0.17114549346091681

key: train_mcc
value: [0.41221894 0.3438986  0.41858962 0.40451992 0.43084241 0.4031367
 0.35377457 0.44649977 0.39637502 0.36808134]

mean value: 0.3977936882004

key: test_accuracy
value: [0.6        0.6        0.57142857 0.64285714 0.57142857 0.64285714
 0.57142857 0.5        0.42857143 0.64285714]

mean value: 0.5771428571428572

key: train_accuracy
value: [0.67716535 0.62992126 0.6953125  0.640625   0.6953125  0.6796875
 0.65625    0.6953125  0.6796875  0.6640625 ]

mean value: 0.6713336614173229

key: test_fscore
value: [0.66666667 0.66666667 0.66666667 0.73684211 0.66666667 0.70588235
 0.625      0.63157895 0.6        0.70588235]

mean value: 0.6671852425180598

key: train_fscore
value: [0.74534161 0.71856287 0.74172185 0.73563218 0.7483871  0.7388535
 0.72151899 0.75471698 0.73548387 0.72611465]

mean value: 0.7366333616453036

key: test_precision
value: [0.54545455 0.6        0.54545455 0.58333333 0.54545455 0.6
 0.55555556 0.5        0.46153846 0.6       ]

mean value: 0.5536790986790987

key: train_precision
value: [0.6185567  0.57692308 0.64367816 0.58181818 0.63736264 0.62365591
 0.60638298 0.63157895 0.62637363 0.61290323]

mean value: 0.6159233450304762

key: test_recall
value: [0.85714286 0.75       0.85714286 1.         0.85714286 0.85714286
 0.71428571 0.85714286 0.85714286 0.85714286]

mean value: 0.8464285714285714

key: train_recall
value: [0.9375     0.95238095 0.875      1.         0.90625    0.90625
 0.890625   0.9375     0.890625   0.890625  ]

mean value: 0.9186755952380953

key: test_roc_auc
value: [0.61607143 0.58928571 0.57142857 0.64285714 0.57142857 0.64285714
 0.57142857 0.5        0.42857143 0.64285714]

mean value: 0.5776785714285715

key: train_roc_auc
value: [0.67509921 0.63244048 0.6953125  0.640625   0.6953125  0.6796875
 0.65625    0.6953125  0.6796875  0.6640625 ]

mean value: 0.6713789682539683

key: test_jcc
value: [0.5        0.5        0.5        0.58333333 0.5        0.54545455
 0.45454545 0.46153846 0.42857143 0.54545455]

mean value: 0.5018897768897769

key: train_jcc
value: [0.59405941 0.56074766 0.58947368 0.58181818 0.59793814 0.58585859
 0.56435644 0.60606061 0.58163265 0.57      ]

mean value: 0.5831945360474582

MCC on Blind test: 0.43

Accuracy on Blind test: 0.69

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00890517 0.00869513 0.00965166 0.00883913 0.00877833 0.00885248
 0.00935841 0.00872326 0.00894189 0.00988293]

mean value: 0.009062838554382325

key: score_time
value: [0.00861073 0.00870466 0.00879693 0.0087924  0.00871682 0.00864601
 0.0091064  0.00876236 0.0087378  0.00916338]

mean value: 0.008803749084472656

key: test_mcc
value: [ 0.18898224  0.49099025  0.4472136  -0.31622777  0.14285714  0.42857143
  0.         -0.1490712  -0.63245553  0.1490712 ]

mean value: 0.0749931358413612

key: train_mcc
value: [0.48209995 0.40158859 0.438357   0.42233925 0.42610928 0.40946151
 0.43943537 0.50024432 0.53229065 0.438357  ]

mean value: 0.44902829325504817

key: test_accuracy
value: [0.6        0.73333333 0.71428571 0.35714286 0.57142857 0.71428571
 0.5        0.42857143 0.21428571 0.57142857]

mean value: 0.5404761904761904

key: train_accuracy
value: [0.74015748 0.7007874  0.71875    0.7109375  0.7109375  0.703125
 0.71875    0.75       0.765625   0.71875   ]

mean value: 0.7237819881889764

key: test_fscore
value: [0.5        0.71428571 0.75       0.18181818 0.57142857 0.71428571
 0.36363636 0.5        0.         0.5       ]

mean value: 0.47954545454545455

key: train_fscore
value: [0.73170732 0.69354839 0.70967742 0.704      0.68907563 0.68333333
 0.70491803 0.75384615 0.75806452 0.70967742]

mean value: 0.7137848209227128

key: test_precision
value: [0.6        0.83333333 0.66666667 0.25       0.57142857 0.71428571
 0.5        0.44444444 0.         0.6       ]

mean value: 0.518015873015873

key: train_precision
value: [0.76271186 0.70491803 0.73333333 0.72131148 0.74545455 0.73214286
 0.74137931 0.74242424 0.78333333 0.73333333]

mean value: 0.7400342327969973

key: test_recall
value: [0.42857143 0.625      0.85714286 0.14285714 0.57142857 0.71428571
 0.28571429 0.57142857 0.         0.42857143]

mean value: 0.46249999999999997

key: train_recall
value: [0.703125   0.68253968 0.6875     0.6875     0.640625   0.640625
 0.671875   0.765625   0.734375   0.6875    ]

mean value: 0.6901289682539683

key: test_roc_auc
value: [0.58928571 0.74107143 0.71428571 0.35714286 0.57142857 0.71428571
 0.5        0.42857143 0.21428571 0.57142857]

mean value: 0.5401785714285714

key: train_roc_auc
value: [0.74045139 0.70064484 0.71875    0.7109375  0.7109375  0.703125
 0.71875    0.75       0.765625   0.71875   ]

mean value: 0.7237971230158731

key: test_jcc
value: [0.33333333 0.55555556 0.6        0.1        0.4        0.55555556
 0.22222222 0.33333333 0.         0.33333333]

mean value: 0.3433333333333333

key: train_jcc
value: [0.57692308 0.5308642  0.55       0.54320988 0.52564103 0.51898734
 0.5443038  0.60493827 0.61038961 0.55      ]

mean value: 0.5555257197873231

MCC on Blind test: 0.27

Accuracy on Blind test: 0.63

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00843072 0.00828362 0.0094943  0.00941944 0.00952077 0.00883842
 0.00867176 0.00860167 0.00916886 0.00823045]

mean value: 0.008866000175476074

key: score_time
value: [0.00952911 0.00987101 0.01023984 0.0103004  0.01070189 0.01650286
 0.01474071 0.01097417 0.00975394 0.00930381]

mean value: 0.011191773414611816

key: test_mcc
value: [ 0.47245559 -0.37796447 -0.28867513 -0.1490712   0.14285714  0.
  0.         -0.28867513 -0.57735027 -0.28867513]

mean value: -0.13550986103646007

key: train_mcc
value: [0.41894709 0.29176205 0.36047677 0.29866683 0.39298268 0.438357
 0.37518324 0.375      0.438357   0.3480246 ]

mean value: 0.3737757275384978

key: test_accuracy
value: [0.73333333 0.33333333 0.35714286 0.42857143 0.57142857 0.5
 0.5        0.35714286 0.21428571 0.35714286]

mean value: 0.4352380952380952

key: train_accuracy
value: [0.70866142 0.64566929 0.6796875  0.6484375  0.6953125  0.71875
 0.6875     0.6875     0.71875    0.671875  ]

mean value: 0.6862143208661418

key: test_fscore
value: [0.66666667 0.44444444 0.4        0.5        0.57142857 0.46153846
 0.46153846 0.30769231 0.26666667 0.30769231]

mean value: 0.43876678876678876

key: train_fscore
value: [0.69918699 0.62809917 0.66666667 0.62809917 0.67768595 0.70967742
 0.69230769 0.6875     0.70967742 0.6440678 ]

mean value: 0.6742968283684786

key: test_precision
value: [0.8        0.4        0.375      0.44444444 0.57142857 0.5
 0.5        0.33333333 0.25       0.33333333]

mean value: 0.45075396825396824

key: train_precision
value: [0.72881356 0.65517241 0.69491525 0.66666667 0.71929825 0.73333333
 0.68181818 0.6875     0.73333333 0.7037037 ]

mean value: 0.700455469182168

key: test_recall
value: [0.57142857 0.5        0.42857143 0.57142857 0.57142857 0.42857143
 0.42857143 0.28571429 0.28571429 0.28571429]

mean value: 0.4357142857142857

key: train_recall
value: [0.671875  0.6031746 0.640625  0.59375   0.640625  0.6875    0.703125
 0.6875    0.6875    0.59375  ]

mean value: 0.6509424603174603

key: test_roc_auc
value: [0.72321429 0.32142857 0.35714286 0.42857143 0.57142857 0.5
 0.5        0.35714286 0.21428571 0.35714286]

mean value: 0.4330357142857143

key: train_roc_auc
value: [0.70895337 0.6453373  0.6796875  0.6484375  0.6953125  0.71875
 0.6875     0.6875     0.71875    0.671875  ]

mean value: 0.6862103174603175

key: test_jcc
value: [0.5        0.28571429 0.25       0.33333333 0.4        0.3
 0.3        0.18181818 0.15384615 0.18181818]

mean value: 0.28865301365301366

key: train_jcc
value: [0.5375     0.45783133 0.5        0.45783133 0.5125     0.55
 0.52941176 0.52380952 0.55       0.475     ]

mean value: 0.5093883939117816

MCC on Blind test: 0.17

Accuracy on Blind test: 0.58

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01124406 0.01096201 0.01099586 0.00968695 0.00980282 0.00962329
 0.00957036 0.01035261 0.00969028 0.00992322]

mean value: 0.010185146331787109

key: score_time
value: [0.00972486 0.00949264 0.00957704 0.00875735 0.00879574 0.00882769
 0.0090363  0.00884104 0.00873518 0.00879645]

mean value: 0.009058427810668946

key: test_mcc
value: [ 0.33928571  0.09449112  0.4472136   0.71428571  0.14285714  0.42857143
  0.31622777  0.         -0.31622777  0.        ]

mean value: 0.21667047137522646

key: train_mcc
value: [0.63789683 0.6852819  0.64070322 0.71910121 0.6253054  0.67195703
 0.6253054  0.78125    0.72015793 0.59491308]

mean value: 0.6701871989258454

key: test_accuracy
value: [0.66666667 0.53333333 0.71428571 0.85714286 0.57142857 0.71428571
 0.64285714 0.5        0.35714286 0.5       ]

mean value: 0.6057142857142858

key: train_accuracy
value: [0.81889764 0.84251969 0.8203125  0.859375   0.8125     0.8359375
 0.8125     0.890625   0.859375   0.796875  ]

mean value: 0.8348917322834646

key: test_fscore
value: [0.66666667 0.46153846 0.75       0.85714286 0.57142857 0.71428571
 0.54545455 0.53333333 0.18181818 0.46153846]

mean value: 0.5743206793206793

key: train_fscore
value: [0.81889764 0.83870968 0.81889764 0.86153846 0.81538462 0.83464567
 0.81538462 0.890625   0.86363636 0.79032258]

mean value: 0.8348042258890461

key: test_precision
value: [0.625      0.6        0.66666667 0.85714286 0.57142857 0.71428571
 0.75       0.5        0.25       0.5       ]

mean value: 0.603452380952381

key: train_precision
value: [0.82539683 0.85245902 0.82539683 0.84848485 0.8030303  0.84126984
 0.8030303  0.890625   0.83823529 0.81666667]

mean value: 0.8344594923786702

key: test_recall
value: [0.71428571 0.375      0.85714286 0.85714286 0.57142857 0.71428571
 0.42857143 0.57142857 0.14285714 0.42857143]

mean value: 0.5660714285714286

key: train_recall
value: [0.8125     0.82539683 0.8125     0.875      0.828125   0.828125
 0.828125   0.890625   0.890625   0.765625  ]

mean value: 0.8356646825396825

key: test_roc_auc
value: [0.66964286 0.54464286 0.71428571 0.85714286 0.57142857 0.71428571
 0.64285714 0.5        0.35714286 0.5       ]

mean value: 0.6071428571428572

key: train_roc_auc
value: [0.81894841 0.84238591 0.8203125  0.859375   0.8125     0.8359375
 0.8125     0.890625   0.859375   0.796875  ]

mean value: 0.8348834325396826

key: test_jcc
value: [0.5        0.3        0.6        0.75       0.4        0.55555556
 0.375      0.36363636 0.1        0.3       ]

mean value: 0.4244191919191919

key: train_jcc
value: [0.69333333 0.72222222 0.69333333 0.75675676 0.68831169 0.71621622
 0.68831169 0.8028169  0.76       0.65333333]

mean value: 0.7174635473227022

MCC on Blind test: 0.42

Accuracy on Blind test: 0.71

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.72048855 0.58740497 0.71089745 0.57400823 0.57025862 0.57146072
 0.68641615 0.55554175 0.5684855  0.64807653]

mean value: 0.619303846359253

key: score_time
value: [0.01467466 0.01231575 0.01452303 0.01452589 0.01463056 0.01481271
 0.01819897 0.01205873 0.01488829 0.01502252]

mean value: 0.014565110206604004

key: test_mcc
value: [ 0.33928571  0.21821789  0.1490712   0.57735027  0.14285714  0.57735027
  0.8660254   0.         -0.28867513  0.28867513]

mean value: 0.28701578880425255

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.66666667 0.6        0.57142857 0.78571429 0.57142857 0.78571429
 0.92857143 0.5        0.35714286 0.64285714]

mean value: 0.6409523809523809

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.57142857 0.625      0.8        0.57142857 0.8
 0.93333333 0.53333333 0.30769231 0.61538462]

mean value: 0.6424267399267399

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.625      0.66666667 0.55555556 0.75       0.57142857 0.75
 0.875      0.5        0.33333333 0.66666667]

mean value: 0.6293650793650793

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.5        0.71428571 0.85714286 0.57142857 0.85714286
 1.         0.57142857 0.28571429 0.57142857]

mean value: 0.6642857142857143

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.66964286 0.60714286 0.57142857 0.78571429 0.57142857 0.78571429
 0.92857143 0.5        0.35714286 0.64285714]

mean value: 0.6419642857142858

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.4        0.45454545 0.66666667 0.4        0.66666667
 0.875      0.36363636 0.18181818 0.44444444]

mean value: 0.49527777777777776

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.29

Accuracy on Blind test: 0.64

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02736211 0.01128983 0.01124406 0.01253748 0.01097155 0.01206899
 0.01089215 0.01065326 0.01102686 0.01143122]

mean value: 0.012947750091552735

key: score_time
value: [0.01159692 0.00895143 0.00873518 0.00955987 0.00854182 0.00929856
 0.00847149 0.00848365 0.00856256 0.00918412]

mean value: 0.00913856029510498

key: test_mcc
value: [0.33928571 0.875      1.         0.8660254  0.57735027 0.8660254
 0.63245553 0.         1.         0.28867513]

mean value: 0.6444817457672706

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.66666667 0.93333333 1.         0.92857143 0.78571429 0.92857143
 0.78571429 0.5        1.         0.64285714]

mean value: 0.8171428571428572

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.93333333 1.         0.92307692 0.76923077 0.93333333
 0.72727273 0.53333333 1.         0.61538462]

mean value: 0.8101631701631702

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.625      1.         1.         1.         0.83333333 0.875
 1.         0.5        1.         0.66666667]

mean value: 0.85

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.875      1.         0.85714286 0.71428571 1.
 0.57142857 0.57142857 1.         0.57142857]

mean value: 0.7875

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.66964286 0.9375     1.         0.92857143 0.78571429 0.92857143
 0.78571429 0.5        1.         0.64285714]

mean value: 0.8178571428571428

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.875      1.         0.85714286 0.625      0.875
 0.57142857 0.36363636 1.         0.44444444]

mean value: 0.7111652236652236

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.52

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.08901286 0.09013486 0.08958101 0.08881617 0.08836842 0.08858609
 0.08868885 0.08849192 0.08824515 0.08973527]

mean value: 0.08896605968475342

key: score_time
value: [0.01702809 0.01857686 0.01709199 0.01713276 0.01712823 0.01718926
 0.01714444 0.01747155 0.01718402 0.01791286]

mean value: 0.01738600730895996

key: test_mcc
value: [ 0.19642857  0.07142857  0.74535599  0.57735027  0.42857143  0.57735027
  0.4472136  -0.28867513 -0.4472136   0.1490712 ]

mean value: 0.24568811662129258

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.6        0.53333333 0.85714286 0.78571429 0.71428571 0.78571429
 0.71428571 0.35714286 0.28571429 0.57142857]

mean value: 0.6204761904761905

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.57142857 0.53333333 0.83333333 0.8        0.71428571 0.8
 0.66666667 0.4        0.16666667 0.5       ]

mean value: 0.5985714285714285

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.57142857 0.57142857 1.         0.75       0.71428571 0.75
 0.8        0.375      0.2        0.6       ]

mean value: 0.6332142857142857

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.57142857 0.5        0.71428571 0.85714286 0.71428571 0.85714286
 0.57142857 0.42857143 0.14285714 0.42857143]

mean value: 0.5785714285714285

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.59821429 0.53571429 0.85714286 0.78571429 0.71428571 0.78571429
 0.71428571 0.35714286 0.28571429 0.57142857]

mean value: 0.6205357142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.4        0.36363636 0.71428571 0.66666667 0.55555556 0.66666667
 0.5        0.25       0.09090909 0.33333333]

mean value: 0.4541053391053391

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.31

Accuracy on Blind test: 0.65

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.0096097  0.0091033  0.00874829 0.00875807 0.008811   0.00929141
 0.00984359 0.00904584 0.00888586 0.00879669]

mean value: 0.009089374542236328

key: score_time
value: [0.00904441 0.00886655 0.00872993 0.00869799 0.0087254  0.0087328
 0.00910592 0.00863886 0.00867748 0.00856519]

mean value: 0.00877845287322998

key: test_mcc
value: [ 0.33928571  0.07142857  0.57735027  0.42857143  0.57735027  0.1490712
  0.         -0.14285714 -0.42857143  0.1490712 ]

mean value: 0.17207000782363663

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.66666667 0.53333333 0.78571429 0.71428571 0.78571429 0.57142857
 0.5        0.42857143 0.28571429 0.57142857]

mean value: 0.5842857142857143

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.53333333 0.8        0.71428571 0.8        0.625
 0.36363636 0.42857143 0.28571429 0.5       ]

mean value: 0.5717207792207792

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.625      0.57142857 0.75       0.71428571 0.75       0.55555556
 0.5        0.42857143 0.28571429 0.6       ]

mean value: 0.5780555555555555

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.71428571 0.5        0.85714286 0.71428571 0.85714286 0.71428571
 0.28571429 0.42857143 0.28571429 0.42857143]

mean value: 0.5785714285714285

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.66964286 0.53571429 0.78571429 0.71428571 0.78571429 0.57142857
 0.5        0.42857143 0.28571429 0.57142857]

mean value: 0.5848214285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.36363636 0.66666667 0.55555556 0.66666667 0.45454545
 0.22222222 0.27272727 0.16666667 0.33333333]

mean value: 0.4202020202020202

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.55

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.13835144 1.1283567  1.13199878 1.13944674 1.13446093 1.13452864
 1.12859964 1.1291821  1.12874389 1.12446833]

mean value: 1.1318137168884277

key: score_time
value: [0.08793807 0.08876872 0.09104156 0.08774018 0.08761907 0.08778667
 0.14704132 0.09132361 0.09439731 0.09718728]

mean value: 0.0960843801498413

key: test_mcc
value: [0.37796447 0.76376262 0.8660254  0.8660254  0.8660254  0.74535599
 0.74535599 0.         0.4472136  0.42857143]

mean value: 0.6106300309259763

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.66666667 0.86666667 0.92857143 0.92857143 0.92857143 0.85714286
 0.85714286 0.5        0.71428571 0.71428571]

mean value: 0.7961904761904762

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.70588235 0.85714286 0.92307692 0.92307692 0.92307692 0.875
 0.83333333 0.53333333 0.66666667 0.71428571]

mean value: 0.795487502693385

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.6        1.         1.         1.         1.         0.77777778
 1.         0.5        0.8        0.71428571]

mean value: 0.8392063492063492

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.85714286 0.75       0.85714286 0.85714286 0.85714286 1.
 0.71428571 0.57142857 0.57142857 0.71428571]

mean value: 0.775

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.67857143 0.875      0.92857143 0.92857143 0.92857143 0.85714286
 0.85714286 0.5        0.71428571 0.71428571]

mean value: 0.7982142857142858

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.54545455 0.75       0.85714286 0.85714286 0.85714286 0.77777778
 0.71428571 0.36363636 0.5        0.55555556]

mean value: 0.6778138528138528

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.38

Accuracy on Blind test: 0.69

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.83326268 0.87439442 0.91672993 0.89082122 0.88721848 0.86894846
 0.90624857 0.85243893 0.90239453 0.89272261]

mean value: 0.8825179815292359

key: score_time
value: [0.22324824 0.22895718 0.18167663 0.22090197 0.22210288 0.22428799
 0.13536072 0.24140263 0.21208525 0.23036623]

mean value: 0.21203896999359131

key: test_mcc
value: [0.37796447 0.60714286 0.8660254  1.         0.71428571 0.74535599
 0.63245553 0.1490712  0.31622777 0.42857143]

mean value: 0.5837100365844096

key: train_mcc
value: [0.93745372 0.93889821 0.93933644 0.93933644 0.90802522 0.95417386
 0.92288947 0.95417386 0.93933644 0.9379581 ]

mean value: 0.9371581765131688

key: test_accuracy
value: [0.66666667 0.8        0.92857143 1.         0.85714286 0.85714286
 0.78571429 0.57142857 0.64285714 0.71428571]

mean value: 0.7823809523809524

key: train_accuracy
value: [0.96850394 0.96850394 0.96875    0.96875    0.953125   0.9765625
 0.9609375  0.9765625  0.96875    0.96875   ]

mean value: 0.9679195374015748

key: test_fscore
value: [0.70588235 0.8        0.93333333 1.         0.85714286 0.875
 0.72727273 0.625      0.54545455 0.71428571]

mean value: 0.7783371530430354

key: train_fscore
value: [0.96923077 0.96923077 0.96969697 0.96969697 0.95454545 0.97709924
 0.96183206 0.97709924 0.96969697 0.96923077]

mean value: 0.9687359205679816

key: test_precision
value: [0.6        0.85714286 0.875      1.         0.85714286 0.77777778
 1.         0.55555556 0.75       0.71428571]

mean value: 0.7986904761904762

key: train_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[0.95454545 0.94029851 0.94117647 0.94117647 0.92647059 0.95522388
 0.94029851 0.95522388 0.94117647 0.95454545]

mean value: 0.9450135685210312

key: test_recall
value: [0.85714286 0.75       1.         1.         0.85714286 1.
 0.57142857 0.71428571 0.42857143 0.71428571]

mean value: 0.7892857142857143

key: train_recall
value: [0.984375 1.       1.       1.       0.984375 1.       0.984375 1.
 1.       0.984375]

mean value: 0.99375

key: test_roc_auc
value: [0.67857143 0.80357143 0.92857143 1.         0.85714286 0.85714286
 0.78571429 0.57142857 0.64285714 0.71428571]

mean value: 0.7839285714285714

key: train_roc_auc
value: [0.96837798 0.96875    0.96875    0.96875    0.953125   0.9765625
 0.9609375  0.9765625  0.96875    0.96875   ]

mean value: 0.9679315476190476

key: test_jcc
value: [0.54545455 0.66666667 0.875      1.         0.75       0.77777778
 0.57142857 0.45454545 0.375      0.55555556]

mean value: 0.6571428571428571

key: train_jcc
value: [0.94029851 0.94029851 0.94117647 0.94117647 0.91304348 0.95522388
 0.92647059 0.95522388 0.94117647 0.94029851]

mean value: 0.9394386761842959

MCC on Blind test: 0.46

Accuracy on Blind test: 0.72

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02314949 0.00937533 0.0098002  0.00965309 0.0087173  0.00990558
 0.00964594 0.00971007 0.00986576 0.00981593]

mean value: 0.010963869094848634

key: score_time
value: [0.01428938 0.00957179 0.00965691 0.00903916 0.00860858 0.00936079
 0.00952983 0.00949478 0.00943947 0.00927114]

mean value: 0.009826183319091797

key: test_mcc
value: [ 0.18898224  0.49099025  0.4472136  -0.31622777  0.14285714  0.42857143
  0.         -0.1490712  -0.63245553  0.1490712 ]

mean value: 0.0749931358413612

key: train_mcc
value: [0.48209995 0.40158859 0.438357   0.42233925 0.42610928 0.40946151
 0.43943537 0.50024432 0.53229065 0.438357  ]

mean value: 0.44902829325504817

key: test_accuracy
value: [0.6        0.73333333 0.71428571 0.35714286 0.57142857 0.71428571
 0.5        0.42857143 0.21428571 0.57142857]

mean value: 0.5404761904761904

key: train_accuracy
value: [0.74015748 0.7007874  0.71875    0.7109375  0.7109375  0.703125
 0.71875    0.75       0.765625   0.71875   ]

mean value: 0.7237819881889764

key: test_fscore
value: [0.5        0.71428571 0.75       0.18181818 0.57142857 0.71428571
 0.36363636 0.5        0.         0.5       ]

mean value: 0.47954545454545455

key: train_fscore
value: [0.73170732 0.69354839 0.70967742 0.704      0.68907563 0.68333333
 0.70491803 0.75384615 0.75806452 0.70967742]

mean value: 0.7137848209227128

key: test_precision
value: [0.6        0.83333333 0.66666667 0.25       0.57142857 0.71428571
 0.5        0.44444444 0.         0.6       ]

mean value: 0.518015873015873

key: train_precision
value: [0.76271186 0.70491803 0.73333333 0.72131148 0.74545455 0.73214286
 0.74137931 0.74242424 0.78333333 0.73333333]

mean value: 0.7400342327969973

key: test_recall
value: [0.42857143 0.625      0.85714286 0.14285714 0.57142857 0.71428571
 0.28571429 0.57142857 0.         0.42857143]

mean value: 0.46249999999999997

key: train_recall
value: [0.703125   0.68253968 0.6875     0.6875     0.640625   0.640625
 0.671875   0.765625   0.734375   0.6875    ]

mean value: 0.6901289682539683

key: test_roc_auc
value: [0.58928571 0.74107143 0.71428571 0.35714286 0.57142857 0.71428571
 0.5        0.42857143 0.21428571 0.57142857]

mean value: 0.5401785714285714

key: train_roc_auc
value: [0.74045139 0.70064484 0.71875    0.7109375  0.7109375  0.703125
 0.71875    0.75       0.765625   0.71875   ]

mean value: 0.7237971230158731

key: test_jcc
value: [0.33333333 0.55555556 0.6        0.1        0.4        0.55555556
 0.22222222 0.33333333 0.         0.33333333]

mean value: 0.3433333333333333

key: train_jcc
value: [0.57692308 0.5308642  0.55       0.54320988 0.52564103 0.51898734
 0.5443038  0.60493827 0.61038961 0.55      ]

mean value: 0.5555257197873231

MCC on Blind test: 0.27

Accuracy on Blind test: 0.63

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.06581926 0.05272436 0.0533154  0.04687142 0.04571605 0.04804492
 0.05116105 0.04486775 0.04888487 0.05014682]

mean value: 0.0507551908493042

key: score_time
value: [0.01139712 0.0113914  0.01027155 0.01038671 0.0105803  0.01110101
 0.0105691  0.0107646  0.01118851 0.01136994]

mean value: 0.010902023315429688

key: test_mcc
value: [0.66143783 0.87287156 1.         0.8660254  0.71428571 0.71428571
 0.74535599 0.1490712  0.8660254  0.42857143]

mean value: 0.7017930244421767

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8        0.93333333 1.         0.92857143 0.85714286 0.85714286
 0.85714286 0.57142857 0.92857143 0.71428571]

mean value: 0.8447619047619047

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.94117647 1.         0.92307692 0.85714286 0.85714286
 0.83333333 0.625      0.93333333 0.71428571]

mean value: 0.850802090066796

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        0.88888889 1.         1.         0.85714286 0.85714286
 1.         0.55555556 0.875      0.71428571]

mean value: 0.8448015873015873

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         0.85714286 0.85714286 0.85714286
 0.71428571 0.71428571 1.         0.71428571]

mean value: 0.8714285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.92857143 1.         0.92857143 0.85714286 0.85714286
 0.85714286 0.57142857 0.92857143 0.71428571]

mean value: 0.8455357142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.88888889 1.         0.85714286 0.75       0.75
 0.71428571 0.45454545 0.875      0.55555556]

mean value: 0.754541847041847

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.04

Accuracy on Blind test: 0.51

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.02718544 0.04948449 0.05275106 0.04704428 0.05495071 0.05065632
 0.04379416 0.04743075 0.04433584 0.05247545]

mean value: 0.04701085090637207

key: score_time
value: [0.02026653 0.02336526 0.01186585 0.02184844 0.01182151 0.02222586
 0.02072549 0.02242494 0.02011228 0.02445412]

mean value: 0.019911026954650878

key: test_mcc
value: [-0.04029115  0.09449112  0.57735027  0.42857143  0.          0.28867513
  0.          0.          0.1490712   0.        ]

mean value: 0.1497868000906891

key: train_mcc
value: [1.         1.         1.         1.         1.         0.96922337
 1.         1.         1.         1.        ]

mean value: 0.9969223369195119

key: test_accuracy
value: [0.46666667 0.53333333 0.78571429 0.71428571 0.5        0.64285714
 0.5        0.5        0.57142857 0.5       ]

mean value: 0.5714285714285714

key: train_accuracy
value: [1.       1.       1.       1.       1.       0.984375 1.       1.
 1.       1.      ]

mean value: 0.9984375

key: test_fscore
value: [0.55555556 0.46153846 0.76923077 0.71428571 0.36363636 0.61538462
 0.36363636 0.53333333 0.625      0.46153846]

mean value: 0.5463139638139638

key: train_fscore
value: [1.         1.         1.         1.         1.         0.98461538
 1.         1.         1.         1.        ]

mean value: 0.9984615384615385

key: test_precision
value: [0.45454545 0.6        0.83333333 0.71428571 0.5        0.66666667
 0.5        0.5        0.55555556 0.5       ]

mean value: 0.5824386724386724

key: train_precision
value: [1.         1.         1.         1.         1.         0.96969697
 1.         1.         1.         1.        ]

mean value: 0.996969696969697

key: test_recall
value: [0.71428571 0.375      0.71428571 0.71428571 0.28571429 0.57142857
 0.28571429 0.57142857 0.71428571 0.42857143]

mean value: 0.5375

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.48214286 0.54464286 0.78571429 0.71428571 0.5        0.64285714
 0.5        0.5        0.57142857 0.5       ]

mean value: 0.5741071428571428

key: train_roc_auc
value: [1.       1.       1.       1.       1.       0.984375 1.       1.
 1.       1.      ]

mean value: 0.9984375

key: test_jcc
value: [0.38461538 0.3        0.625      0.55555556 0.22222222 0.44444444
 0.22222222 0.36363636 0.45454545 0.3       ]

mean value: 0.3872241647241647

key: train_jcc
value: [1.         1.         1.         1.         1.         0.96969697
 1.         1.         1.         1.        ]

mean value: 0.996969696969697

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.0241859  0.00881934 0.00854516 0.00849938 0.00865054 0.00857329
 0.00858045 0.00881457 0.00852299 0.00857997]

mean value: 0.010177159309387207

key: score_time
value: [0.00976634 0.00857759 0.00845194 0.00852895 0.00844574 0.00858092
 0.00846004 0.00851583 0.0085721  0.00858021]

mean value: 0.008647966384887695

key: test_mcc
value: [ 0.21821789  0.32732684  0.17407766  0.71428571 -0.1490712   0.57735027
  0.1490712   0.4472136   0.1490712   0.1490712 ]

mean value: 0.2756614357520949

key: train_mcc
value: [0.38660962 0.3754942  0.42824786 0.39298268 0.42442129 0.39298268
 0.38177086 0.34442336 0.44095855 0.43943537]

mean value: 0.40073264569464856

key: test_accuracy
value: [0.6        0.66666667 0.57142857 0.85714286 0.42857143 0.78571429
 0.57142857 0.71428571 0.57142857 0.57142857]

mean value: 0.6338095238095238

key: train_accuracy
value: [0.69291339 0.68503937 0.7109375  0.6953125  0.7109375  0.6953125
 0.6875     0.671875   0.71875    0.71875   ]

mean value: 0.6987327755905511

key: test_fscore
value: [0.625      0.70588235 0.66666667 0.85714286 0.5        0.8
 0.5        0.75       0.5        0.625     ]

mean value: 0.6529691876750701

key: train_fscore
value: [0.70676692 0.70588235 0.73381295 0.71111111 0.72592593 0.71111111
 0.71428571 0.68181818 0.73529412 0.73134328]

mean value: 0.715735166535589

key: test_precision
value: [0.55555556 0.66666667 0.54545455 0.85714286 0.44444444 0.75
 0.6        0.66666667 0.6        0.55555556]

mean value: 0.6241486291486291

key: train_precision
value: [0.68115942 0.65753425 0.68       0.67605634 0.69014085 0.67605634
 0.65789474 0.66176471 0.69444444 0.7       ]

mean value: 0.6775051075160861

key: test_recall
value: [0.71428571 0.75       0.85714286 0.85714286 0.57142857 0.85714286
 0.42857143 0.85714286 0.42857143 0.71428571]

mean value: 0.7035714285714285

key: train_recall
value: [0.734375   0.76190476 0.796875   0.75       0.765625   0.75
 0.78125    0.703125   0.78125    0.765625  ]

mean value: 0.7590029761904762

key: test_roc_auc
value: [0.60714286 0.66071429 0.57142857 0.85714286 0.42857143 0.78571429
 0.57142857 0.71428571 0.57142857 0.57142857]

mean value: 0.6339285714285714

key: train_roc_auc
value: [0.69258433 0.68563988 0.7109375  0.6953125  0.7109375  0.6953125
 0.6875     0.671875   0.71875    0.71875   ]

mean value: 0.6987599206349207

key: test_jcc
value: [0.45454545 0.54545455 0.5        0.75       0.33333333 0.66666667
 0.33333333 0.6        0.33333333 0.45454545]

mean value: 0.4971212121212121

key: train_jcc
value: [0.54651163 0.54545455 0.57954545 0.55172414 0.56976744 0.55172414
 0.55555556 0.51724138 0.58139535 0.57647059]

mean value: 0.5575390217567915

MCC on Blind test: 0.32

Accuracy on Blind test: 0.66

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01038313 0.01427412 0.01598454 0.01353955 0.01545024 0.01576066
 0.01404667 0.01471567 0.01347804 0.01635647]

mean value: 0.014398908615112305

key: score_time
value: [0.00853682 0.01141334 0.01140761 0.01149416 0.01144385 0.01149607
 0.01146412 0.01140809 0.01136732 0.01147699]

mean value: 0.011150836944580078

key: test_mcc
value: [0.46428571 0.56407607 0.71428571 0.71428571 0.42857143 0.74535599
 0.57735027 0.57735027 0.17407766 0.28867513]

mean value: 0.5248313967676029

key: train_mcc
value: [0.86101708 0.72678367 0.82717019 0.80168466 0.85042006 0.84063468
 0.78756153 0.90625    0.77459667 0.93933644]

mean value: 0.8315454985026968

key: test_accuracy
value: [0.73333333 0.73333333 0.85714286 0.85714286 0.71428571 0.85714286
 0.78571429 0.78571429 0.57142857 0.64285714]

mean value: 0.7538095238095238

key: train_accuracy
value: [0.92913386 0.8503937  0.90625    0.8984375  0.921875   0.9140625
 0.8828125  0.953125   0.875      0.96875   ]

mean value: 0.9099840059055118

key: test_fscore
value: [0.71428571 0.66666667 0.85714286 0.85714286 0.71428571 0.83333333
 0.8        0.8        0.66666667 0.61538462]

mean value: 0.7524908424908424

key: train_fscore
value: [0.92682927 0.82568807 0.89655172 0.9037037  0.92647059 0.90598291
 0.8951049  0.953125   0.88888889 0.96969697]

mean value: 0.9092042017437767

key: test_precision
value: [0.71428571 1.         0.85714286 0.85714286 0.71428571 1.
 0.75       0.75       0.54545455 0.66666667]

mean value: 0.7854978354978355

key: train_precision
value: [0.96610169 0.97826087 1.         0.85915493 0.875      1.
 0.81012658 0.953125   0.8        0.94117647]

mean value: 0.9182945546924652

key: test_recall
value: [0.71428571 0.5        0.85714286 0.85714286 0.71428571 0.71428571
 0.85714286 0.85714286 0.85714286 0.57142857]

mean value: 0.75

key: train_recall
value: [0.890625   0.71428571 0.8125     0.953125   0.984375   0.828125
 1.         0.953125   1.         1.        ]

mean value: 0.9136160714285715

key: test_roc_auc
value: [0.73214286 0.75       0.85714286 0.85714286 0.71428571 0.85714286
 0.78571429 0.78571429 0.57142857 0.64285714]

mean value: 0.7553571428571428

key: train_roc_auc
value: [0.92943948 0.84933036 0.90625    0.8984375  0.921875   0.9140625
 0.8828125  0.953125   0.875      0.96875   ]

mean value: 0.9099082341269842

key: test_jcc
value: [0.55555556 0.5        0.75       0.75       0.55555556 0.71428571
 0.66666667 0.66666667 0.5        0.44444444]

mean value: 0.6103174603174604

key: train_jcc
value: [0.86363636 0.703125   0.8125     0.82432432 0.8630137  0.828125
 0.81012658 0.91044776 0.8        0.94117647]

mean value: 0.8356475200651571

MCC on Blind test: 0.31

Accuracy on Blind test: 0.62

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01415157 0.01356316 0.01522613 0.01370096 0.01369357 0.01361775
 0.01383018 0.0135622  0.01277542 0.01345587]

mean value: 0.013757681846618653

key: score_time
value: [0.0115037  0.01245689 0.01166725 0.01139021 0.01141787 0.01142097
 0.01140141 0.01140237 0.01158595 0.02333975]

mean value: 0.012758636474609375

key: test_mcc
value: [0.37796447 0.         0.74535599 0.8660254  0.28867513 0.57735027
 0.63245553 0.17407766 0.         0.1490712 ]

mean value: 0.38109756595673944

key: train_mcc
value: [0.89071137 0.35476806 0.64978629 0.83643673 0.85042006 0.87542756
 0.85947992 0.45557345 0.50487816 0.90669283]

mean value: 0.7184174438156392

key: test_accuracy
value: [0.66666667 0.46666667 0.85714286 0.92857143 0.64285714 0.78571429
 0.78571429 0.57142857 0.5        0.57142857]

mean value: 0.6776190476190476

key: train_accuracy
value: [0.94488189 0.61417323 0.796875   0.9140625  0.921875   0.9375
 0.9296875  0.671875   0.703125   0.953125  ]

mean value: 0.8387180118110236

key: test_fscore
value: [0.70588235 0.         0.83333333 0.92307692 0.61538462 0.8
 0.72727273 0.4        0.63157895 0.5       ]

mean value: 0.6136528899377196

key: train_fscore
value: [0.94656489 0.36363636 0.74509804 0.90756303 0.91666667 0.93846154
 0.92913386 0.51162791 0.77108434 0.95238095]

mean value: 0.7982217573661332

key: test_precision
value: [0.6        0.         1.         1.         0.66666667 0.75
 1.         0.66666667 0.5        0.6       ]

mean value: 0.6783333333333333

key: train_precision
value: [0.92537313 1.         1.         0.98181818 0.98214286 0.92424242
 0.93650794 1.         0.62745098 0.96774194]

mean value: 0.9345277449915785

key: test_recall
value: [0.85714286 0.         0.71428571 0.85714286 0.57142857 0.85714286
 0.57142857 0.28571429 0.85714286 0.42857143]

mean value: 0.6

key: train_recall
value: [0.96875    0.22222222 0.59375    0.84375    0.859375   0.953125
 0.921875   0.34375    1.         0.9375    ]

mean value: 0.7644097222222223

key: test_roc_auc
value: [0.67857143 0.5        0.85714286 0.92857143 0.64285714 0.78571429
 0.78571429 0.57142857 0.5        0.57142857]

mean value: 0.6821428571428572

key: train_roc_auc
value: [0.94469246 0.61111111 0.796875   0.9140625  0.921875   0.9375
 0.9296875  0.671875   0.703125   0.953125  ]

mean value: 0.8383928571428572

key: test_jcc
value: [0.54545455 0.         0.71428571 0.85714286 0.44444444 0.66666667
 0.57142857 0.25       0.46153846 0.33333333]

mean value: 0.4844294594294594

key: train_jcc
value: [0.89855072 0.22222222 0.59375    0.83076923 0.84615385 0.88405797
 0.86764706 0.34375    0.62745098 0.90909091]

mean value: 0.7023442943104068

MCC on Blind test: 0.26

Accuracy on Blind test: 0.62

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.11391807 0.09409237 0.09487605 0.09567189 0.09379506 0.09356427
 0.09453201 0.09791088 0.09554052 0.09554005]

mean value: 0.09694411754608154

key: score_time
value: [0.01464009 0.0145905  0.01488519 0.01464534 0.01463294 0.01461124
 0.01467419 0.0150106  0.01514769 0.01581359]

mean value: 0.01486513614654541

key: test_mcc
value: [0.66143783 0.87287156 1.         1.         0.74535599 0.8660254
 0.52223297 0.28867513 0.8660254  0.57735027]

mean value: 0.7399974560430457

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8        0.93333333 1.         1.         0.85714286 0.92857143
 0.71428571 0.64285714 0.92857143 0.78571429]

mean value: 0.8590476190476191

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.94117647 1.         1.         0.83333333 0.92307692
 0.6        0.66666667 0.93333333 0.76923077]

mean value: 0.8490346907993966

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        0.88888889 1.         1.         1.         1.
 1.         0.625      0.875      0.83333333]

mean value: 0.8922222222222222

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.71428571 0.85714286
 0.42857143 0.71428571 1.         0.71428571]

mean value: 0.8428571428571429

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.92857143 1.         1.         0.85714286 0.92857143
 0.71428571 0.64285714 0.92857143 0.78571429]

mean value: 0.8598214285714286

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.88888889 1.         1.         0.71428571 0.85714286
 0.42857143 0.5        0.875      0.625     ]

mean value: 0.7588888888888888

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: -0.0

Accuracy on Blind test: 0.5

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03230977 0.03465533 0.05089736 0.04256916 0.04040456 0.05487514
 0.04169941 0.04757524 0.03255701 0.04550004]

mean value: 0.04230430126190186

key: score_time
value: [0.01723957 0.02700257 0.02792835 0.0355022  0.02915096 0.02598453
 0.03048086 0.01950192 0.01615834 0.01654243]

mean value: 0.024549174308776855

key: test_mcc
value: [0.76376262 0.87287156 1.         1.         0.57735027 0.8660254
 0.74535599 0.1490712  0.8660254  0.57735027]

mean value: 0.7417812713717987

key: train_mcc
value: [1.         1.         0.98449518 0.95324137 0.95324137 0.96922337
 0.98449518 0.96922337 0.96922337 1.        ]

mean value: 0.9783143216676922

key: test_accuracy
value: [0.86666667 0.93333333 1.         1.         0.78571429 0.92857143
 0.85714286 0.57142857 0.92857143 0.78571429]

mean value: 0.8657142857142857

key: train_accuracy
value: [1.        1.        0.9921875 0.9765625 0.9765625 0.984375  0.9921875
 0.984375  0.984375  1.       ]

mean value: 0.9890625

key: test_fscore
value: [0.875      0.94117647 1.         1.         0.76923077 0.93333333
 0.83333333 0.625      0.93333333 0.76923077]

mean value: 0.8679638009049774

key: train_fscore
value: [1.         1.         0.99212598 0.97637795 0.97674419 0.98412698
 0.99224806 0.98412698 0.98412698 1.        ]

mean value: 0.9889877137450842

key: test_precision
value: [0.77777778 0.88888889 1.         1.         0.83333333 0.875
 1.         0.55555556 0.875      0.83333333]

mean value: 0.8638888888888889

key: train_precision
value: [1.         1.         1.         0.98412698 0.96923077 1.
 0.98461538 1.         1.         1.        ]

mean value: 0.9937973137973138

key: test_recall
value: [1.         1.         1.         1.         0.71428571 1.
 0.71428571 0.71428571 1.         0.71428571]

mean value: 0.8857142857142857

key: train_recall
value: [1.       1.       0.984375 0.96875  0.984375 0.96875  1.       0.96875
 0.96875  1.      ]

mean value: 0.984375

key: test_roc_auc
value: [0.875      0.92857143 1.         1.         0.78571429 0.92857143
 0.85714286 0.57142857 0.92857143 0.78571429]

mean value: 0.8660714285714286

key: train_roc_auc
value: [1.        1.        0.9921875 0.9765625 0.9765625 0.984375  0.9921875
 0.984375  0.984375  1.       ]

mean value: 0.9890625

key: test_jcc
value: [0.77777778 0.88888889 1.         1.         0.625      0.875
 0.71428571 0.45454545 0.875      0.625     ]

mean value: 0.7835497835497836

key: train_jcc
value: [1.         1.         0.984375   0.95384615 0.95454545 0.96875
 0.98461538 0.96875    0.96875    1.        ]

mean value: 0.9783631993006994

MCC on Blind test: 0.06

Accuracy on Blind test: 0.52

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.03254342 0.05661488 0.04824209 0.05007434 0.06980419 0.04639006
 0.03822303 0.04851413 0.0455544  0.03839159]

mean value: 0.047435212135314944

key: score_time
value: [0.02400923 0.0254271  0.02423596 0.02527332 0.02484155 0.02546525
 0.02542686 0.02713227 0.02601194 0.02398038]

mean value: 0.02518038749694824

key: test_mcc
value: [ 0.32732684 -0.19642857  0.4472136   0.63245553  0.28867513  0.1490712
  0.28867513  0.14285714 -0.4472136   0.        ]

mean value: 0.16326324065058476

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.66666667 0.4        0.71428571 0.78571429 0.64285714 0.57142857
 0.64285714 0.57142857 0.28571429 0.5       ]

mean value: 0.5780952380952381

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.61538462 0.4        0.66666667 0.72727273 0.66666667 0.5
 0.61538462 0.57142857 0.16666667 0.53333333]

mean value: 0.5462803862803862

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.42857143 0.8        1.         0.625      0.6
 0.66666667 0.57142857 0.2        0.5       ]

mean value: 0.6058333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.57142857 0.375      0.57142857 0.57142857 0.71428571 0.42857143
 0.57142857 0.57142857 0.14285714 0.57142857]

mean value: 0.5089285714285714

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.66071429 0.40178571 0.71428571 0.78571429 0.64285714 0.57142857
 0.64285714 0.57142857 0.28571429 0.5       ]

mean value: 0.5776785714285714

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.44444444 0.25       0.5        0.57142857 0.5        0.33333333
 0.44444444 0.4        0.09090909 0.36363636]

mean value: 0.3898196248196248

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.27084661 0.25251555 0.25915909 0.25225282 0.25605011 0.25171423
 0.25446415 0.25518131 0.25559974 0.24836516]

mean value: 0.25561487674713135

key: score_time
value: [0.00921893 0.00908136 0.00908971 0.0089159  0.00926518 0.00900149
 0.00906825 0.00937915 0.00925112 0.00913453]

mean value: 0.009140563011169434

key: test_mcc
value: [0.66143783 0.87287156 1.         1.         0.71428571 0.8660254
 0.74535599 0.31622777 0.8660254  0.42857143]

mean value: 0.7470801097652905

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.8        0.93333333 1.         1.         0.85714286 0.92857143
 0.85714286 0.64285714 0.92857143 0.71428571]

mean value: 0.8661904761904762

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.94117647 1.         1.         0.85714286 0.93333333
 0.83333333 0.70588235 0.93333333 0.71428571]

mean value: 0.8742016806722689

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.7        0.88888889 1.         1.         0.85714286 0.875
 1.         0.6        0.875      0.71428571]

mean value: 0.851031746031746

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         0.85714286 1.
 0.71428571 0.85714286 1.         0.71428571]

mean value: 0.9142857142857143

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8125     0.92857143 1.         1.         0.85714286 0.92857143
 0.85714286 0.64285714 0.92857143 0.71428571]

mean value: 0.8669642857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.88888889 1.         1.         0.75       0.875
 0.71428571 0.54545455 0.875      0.55555556]

mean value: 0.7904184704184705

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.52

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01737309 0.016608   0.01730108 0.01654887 0.04343247 0.01705122
 0.02059937 0.01725125 0.01745582 0.01699615]

mean value: 0.020061731338500977

key: score_time
value: [0.01210737 0.0118475  0.01199269 0.01186085 0.01219201 0.01192856
 0.01498175 0.01467967 0.01503849 0.01461554]

mean value: 0.013124442100524903

key: test_mcc
value: [ 0.05455447  0.20044593  0.14285714 -0.40824829 -0.14285714 -0.14285714
 -0.28867513 -0.14285714 -0.1490712   0.        ]

mean value: -0.08767085052796311

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.53333333 0.6        0.57142857 0.35714286 0.42857143 0.42857143
 0.35714286 0.42857143 0.42857143 0.5       ]

mean value: 0.4633333333333333

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.46153846 0.7        0.57142857 0.52631579 0.42857143 0.42857143
 0.30769231 0.42857143 0.5        0.46153846]

mean value: 0.4814227877385772

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.58333333 0.57142857 0.41666667 0.42857143 0.42857143
 0.33333333 0.42857143 0.44444444 0.5       ]

mean value: 0.4634920634920635

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.42857143 0.875      0.57142857 0.71428571 0.42857143 0.42857143
 0.28571429 0.42857143 0.57142857 0.42857143]

mean value: 0.5160714285714285

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.52678571 0.58035714 0.57142857 0.35714286 0.42857143 0.42857143
 0.35714286 0.42857143 0.42857143 0.5       ]

mean value: 0.4607142857142857

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.3        0.53846154 0.4        0.35714286 0.27272727 0.27272727
 0.18181818 0.27272727 0.33333333 0.3       ]

mean value: 0.32289377289377286

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.03

Accuracy on Blind test: 0.51

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.03756571 0.01294708 0.01308179 0.01303172 0.01298833 0.01315022
 0.01306343 0.012995   0.01294088 0.02156067]

mean value: 0.016332483291625975

key: score_time
value: [0.0116086  0.01149893 0.01147532 0.01145434 0.01147461 0.01149917
 0.01148558 0.0114975  0.0114572  0.01152682]

mean value: 0.011497807502746583

key: test_mcc
value: [0.21821789 0.26189246 0.71428571 0.74535599 0.74535599 0.8660254
 0.8660254  0.4472136  0.28867513 0.4472136 ]

mean value: 0.5600261185993421

key: train_mcc
value: [0.93748452 0.93748452 0.92288947 0.89073374 0.89073374 0.90625
 0.85947992 0.95324137 0.95417386 0.9379581 ]

mean value: 0.9190429255191599

key: test_accuracy
value: [0.6        0.6        0.85714286 0.85714286 0.85714286 0.92857143
 0.92857143 0.71428571 0.64285714 0.71428571]

mean value: 0.77

key: train_accuracy
value: [0.96850394 0.96850394 0.9609375  0.9453125  0.9453125  0.953125
 0.9296875  0.9765625  0.9765625  0.96875   ]

mean value: 0.9593257874015748

key: test_fscore
value: [0.625      0.5        0.85714286 0.83333333 0.83333333 0.92307692
 0.92307692 0.75       0.66666667 0.66666667]

mean value: 0.7578296703296703

key: train_fscore
value: [0.96825397 0.96875    0.96183206 0.94573643 0.94573643 0.953125
 0.92913386 0.97674419 0.97709924 0.96923077]

mean value: 0.9595641947725944

key: test_precision
value: [0.55555556 0.75       0.85714286 1.         1.         1.
 1.         0.66666667 0.625      0.8       ]

mean value: 0.8254365079365079

key: train_precision
value: [0.98387097 0.95384615 0.94029851 0.93846154 0.93846154 0.953125
 0.93650794 0.96923077 0.95522388 0.95454545]

mean value: 0.9523571746855028

key: test_recall
value: [0.71428571 0.375      0.85714286 0.71428571 0.71428571 0.85714286
 0.85714286 0.85714286 0.71428571 0.57142857]

mean value: 0.7232142857142857

key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:175: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:178: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.953125   0.98412698 0.984375   0.953125   0.953125   0.953125
 0.921875   0.984375   1.         0.984375  ]

mean value: 0.9671626984126984

key: test_roc_auc
value: [0.60714286 0.61607143 0.85714286 0.85714286 0.85714286 0.92857143
 0.92857143 0.71428571 0.64285714 0.71428571]

mean value: 0.7723214285714286

key: train_roc_auc
value: [0.96862599 0.96862599 0.9609375  0.9453125  0.9453125  0.953125
 0.9296875  0.9765625  0.9765625  0.96875   ]

mean value: 0.9593501984126984

key: test_jcc
value: [0.45454545 0.33333333 0.75       0.71428571 0.71428571 0.85714286
 0.85714286 0.6        0.5        0.5       ]

mean value: 0.628073593073593

key: train_jcc
value: [0.93846154 0.93939394 0.92647059 0.89705882 0.89705882 0.91044776
 0.86764706 0.95454545 0.95522388 0.94029851]

mean value: 0.922660637577231

MCC on Blind test: 0.24

Accuracy on Blind test: 0.62

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.17722702 0.11350346 0.21813035 0.2076714  0.21254301 0.24001646
 0.32121396 0.21115351 0.19236588 0.18778825]

mean value: 0.2081613302230835

key: score_time
value: [0.0201118  0.01175857 0.01412797 0.02275658 0.02106333 0.02480865
 0.02300787 0.02113318 0.0210979  0.0177002 ]

mean value: 0.019756603240966796

key: test_mcc
value: [0.21821789 0.26189246 0.71428571 0.74535599 0.74535599 0.8660254
 0.8660254  0.4472136  0.28867513 0.4472136 ]

mean value: 0.5600261185993421

key: train_mcc
value: [0.93748452 0.93748452 0.92288947 0.89073374 0.89073374 0.90625
 0.85947992 0.95324137 0.95417386 0.9379581 ]

mean value: 0.9190429255191599

key: test_accuracy
value: [0.6        0.6        0.85714286 0.85714286 0.85714286 0.92857143
 0.92857143 0.71428571 0.64285714 0.71428571]

mean value: 0.77

key: train_accuracy
value: [0.96850394 0.96850394 0.9609375  0.9453125  0.9453125  0.953125
 0.9296875  0.9765625  0.9765625  0.96875   ]

mean value: 0.9593257874015748

key: test_fscore
value: [0.625      0.5        0.85714286 0.83333333 0.83333333 0.92307692
 0.92307692 0.75       0.66666667 0.66666667]

mean value: 0.7578296703296703

key: train_fscore
value: [0.96825397 0.96875    0.96183206 0.94573643 0.94573643 0.953125
 0.92913386 0.97674419 0.97709924 0.96923077]

mean value: 0.9595641947725944

key: test_precision
value: [0.55555556 0.75       0.85714286 1.         1.         1.
 1.         0.66666667 0.625      0.8       ]

mean value: 0.8254365079365079

key: train_precision
value: [0.98387097 0.95384615 0.94029851 0.93846154 0.93846154 0.953125
 0.93650794 0.96923077 0.95522388 0.95454545]

mean value: 0.9523571746855028

key: test_recall
value: [0.71428571 0.375      0.85714286 0.71428571 0.71428571 0.85714286
 0.85714286 0.85714286 0.71428571 0.57142857]

mean value: 0.7232142857142857

key: train_recall
value: [0.953125   0.98412698 0.984375   0.953125   0.953125   0.953125
 0.921875   0.984375   1.         0.984375  ]

mean value: 0.9671626984126984

key: test_roc_auc
value: [0.60714286 0.61607143 0.85714286 0.85714286 0.85714286 0.92857143
 0.92857143 0.71428571 0.64285714 0.71428571]

mean value: 0.7723214285714286

key: train_roc_auc
value: [0.96862599 0.96862599 0.9609375  0.9453125  0.9453125  0.953125
 0.9296875  0.9765625  0.9765625  0.96875   ]

mean value: 0.9593501984126984

key: test_jcc
value: [0.45454545 0.33333333 0.75       0.71428571 0.71428571 0.85714286
 0.85714286 0.6        0.5        0.5       ]

mean value: 0.628073593073593

key: train_jcc
value: [0.93846154 0.93939394 0.92647059 0.89705882 0.89705882 0.91044776
 0.86764706 0.95454545 0.95522388 0.94029851]

mean value: 0.922660637577231

MCC on Blind test: 0.24

Accuracy on Blind test: 0.62

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03295398 0.03293633 0.04036927 0.07436967 0.05748558 0.04356003
 0.07211637 0.03021264 0.03299451 0.03802323]

mean value: 0.04550216197967529

key: score_time
value: [0.01606202 0.01162028 0.0115943  0.01190186 0.01204824 0.02361083
 0.02349329 0.01182771 0.01200891 0.01200128]

mean value: 0.014616870880126953

key: test_mcc
value: [0.39393939 0.66414149 0.65909298 0.48075018 0.74242424 0.74047959
 0.74047959 0.74047959 0.56694671 0.48795004]

mean value: 0.6216683798300241

key: train_mcc
value: [0.80500813 0.85463818 0.89371934 0.86356283 0.84407425 0.86358877
 0.86493273 0.88292404 0.85473156 0.87481777]

mean value: 0.8601997596512527

key: test_accuracy
value: [0.69565217 0.82608696 0.82608696 0.73913043 0.86956522 0.86956522
 0.86956522 0.86956522 0.77272727 0.72727273]

mean value: 0.8065217391304348

key: train_accuracy
value: [0.90243902 0.92682927 0.94634146 0.93170732 0.92195122 0.93170732
 0.93170732 0.94146341 0.92718447 0.9368932 ]

mean value: 0.9298224011366327

key: test_fscore
value: [0.69565217 0.83333333 0.8        0.7        0.86956522 0.88
 0.88       0.88       0.8        0.66666667]

mean value: 0.8005217391304347

key: train_fscore
value: [0.90384615 0.92890995 0.9478673  0.93269231 0.9223301  0.93203883
 0.93333333 0.94117647 0.92822967 0.93838863]

mean value: 0.9308812739347887

key: test_precision
value: [0.66666667 0.76923077 0.88888889 0.77777778 0.90909091 0.84615385
 0.84615385 0.84615385 0.71428571 0.85714286]

mean value: 0.8121545121545122

key: train_precision
value: [0.8952381  0.90740741 0.92592593 0.92380952 0.91346154 0.92307692
 0.90740741 0.94117647 0.91509434 0.91666667]

mean value: 0.9169264298204365

key: test_recall
value: [0.72727273 0.90909091 0.72727273 0.63636364 0.83333333 0.91666667
 0.91666667 0.91666667 0.90909091 0.54545455]

mean value: 0.8037878787878787

key: train_recall
value: [0.91262136 0.95145631 0.97087379 0.94174757 0.93137255 0.94117647
 0.96078431 0.94117647 0.94174757 0.96116505]

mean value: 0.9454121454407005

key: test_roc_auc
value: [0.6969697  0.82954545 0.8219697  0.73484848 0.87121212 0.86742424
 0.86742424 0.86742424 0.77272727 0.72727273]

mean value: 0.8056818181818182

key: train_roc_auc
value: [0.90238911 0.92670855 0.94622121 0.9316581  0.92199695 0.93175328
 0.93184847 0.94146202 0.92718447 0.9368932 ]

mean value: 0.9298115362649915

key: test_jcc
value: [0.53333333 0.71428571 0.66666667 0.53846154 0.76923077 0.78571429
 0.78571429 0.78571429 0.66666667 0.5       ]

mean value: 0.6745787545787546

key: train_jcc
value: [0.8245614  0.86725664 0.9009009  0.87387387 0.85585586 0.87272727
 0.875      0.88888889 0.86607143 0.88392857]

mean value: 0.8709064832923705

MCC on Blind test: 0.34

Accuracy on Blind test: 0.67

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.93171263 0.76386046 0.90062022 0.77434134 0.75542283 0.82477236
 0.75631881 0.79757857 0.92382717 0.79171753]

mean value: 0.8220171928405762

key: score_time
value: [0.01185131 0.0120995  0.02216887 0.01233625 0.01512313 0.01532745
 0.01551938 0.01230907 0.01238847 0.01561093]

mean value: 0.014473438262939453

key: test_mcc
value: [0.82575758 0.74047959 0.76277007 0.56818182 0.76764947 0.82575758
 0.74242424 0.91666667 0.54772256 0.75592895]

mean value: 0.7453338517356568

key: train_mcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.91369855 1.        ]

mean value: 0.9913698554847693

key: test_accuracy
value: [0.91304348 0.86956522 0.86956522 0.7826087  0.86956522 0.91304348
 0.86956522 0.95652174 0.77272727 0.86363636]

mean value: 0.8679841897233201

key: train_accuracy
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.95631068 1.        ]

mean value: 0.9956310679611651

key: test_fscore
value: [0.90909091 0.85714286 0.84210526 0.7826087  0.85714286 0.91666667
 0.86956522 0.95652174 0.7826087  0.84210526]

mean value: 0.8615558164185166

key: train_fscore
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.95734597 1.        ]

mean value: 0.9957345971563981

key: test_precision
value: [0.90909091 0.9        1.         0.75       1.         0.91666667
 0.90909091 1.         0.75       1.        ]

mean value: 0.9134848484848485

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.93518519 1.        ]

mean value: 0.9935185185185185

key: test_recall
value: [0.90909091 0.81818182 0.72727273 0.81818182 0.75       0.91666667
 0.83333333 0.91666667 0.81818182 0.72727273]

mean value: 0.8234848484848485

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.98058252 1.        ]

mean value: 0.9980582524271845

key: test_roc_auc
value: [0.91287879 0.86742424 0.86363636 0.78409091 0.875      0.91287879
 0.87121212 0.95833333 0.77272727 0.86363636]

mean value: 0.8681818181818182

key: train_roc_auc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.95631068 1.        ]

mean value: 0.9956310679611651

key: test_jcc
value: [0.83333333 0.75       0.72727273 0.64285714 0.75       0.84615385
 0.76923077 0.91666667 0.64285714 0.72727273]

mean value: 0.7605644355644355

key: train_jcc
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.91818182 1.        ]

mean value: 0.9918181818181818

MCC on Blind test: 0.19

Accuracy on Blind test: 0.59

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.0132544  0.01033616 0.00911188 0.009161   0.00958657 0.00882697
 0.00929618 0.00892305 0.0093689  0.00876236]

mean value: 0.009662747383117676

key: score_time
value: [0.01657176 0.00902963 0.00913548 0.00981712 0.00965953 0.00855589
 0.0086391  0.00849652 0.00863767 0.008816  ]

mean value: 0.009735870361328124

key: test_mcc
value: [0.11236664 0.56490196 0.65151515 0.06579517 0.22407133 0.50168817
 0.58002308 0.42228828 0.48795004 0.09759001]

mean value: 0.37081898188601464

key: train_mcc
value: [0.41031528 0.49366174 0.51698955 0.40881923 0.40551208 0.45203295
 0.49026396 0.44322953 0.45669396 0.43639645]

mean value: 0.45139147236435284

key: test_accuracy
value: [0.52173913 0.7826087  0.82608696 0.52173913 0.60869565 0.73913043
 0.7826087  0.69565217 0.72727273 0.54545455]

mean value: 0.675098814229249

key: train_accuracy
value: [0.68292683 0.74634146 0.74146341 0.67317073 0.68780488 0.71707317
 0.73170732 0.71219512 0.7184466  0.70873786]

mean value: 0.7119867392848686

key: test_fscore
value: [0.64516129 0.76190476 0.81818182 0.59259259 0.68965517 0.78571429
 0.81481481 0.75862069 0.76923077 0.61538462]

mean value: 0.7251260810215204

key: train_fscore
value: [0.743083   0.74       0.781893   0.74329502 0.73553719 0.75
 0.76793249 0.74678112 0.75423729 0.74576271]

mean value: 0.7508521822638834

key: test_precision
value: [0.5        0.8        0.81818182 0.5        0.58823529 0.6875
 0.73333333 0.64705882 0.66666667 0.53333333]

mean value: 0.6474309269162211

key: train_precision
value: [0.62666667 0.7628866  0.67857143 0.61392405 0.63571429 0.66923077
 0.67407407 0.66412214 0.66917293 0.66165414]

mean value: 0.6656017077902033

key: test_recall
value: [0.90909091 0.72727273 0.81818182 0.72727273 0.83333333 0.91666667
 0.91666667 0.91666667 0.90909091 0.72727273]

mean value: 0.8401515151515151

key: train_recall
value: [0.91262136 0.7184466  0.9223301  0.94174757 0.87254902 0.85294118
 0.89215686 0.85294118 0.86407767 0.85436893]

mean value: 0.8684180468303826

key: test_roc_auc
value: [0.53787879 0.78030303 0.82575758 0.53030303 0.59848485 0.73106061
 0.77651515 0.68560606 0.72727273 0.54545455]

mean value: 0.6738636363636363

key: train_roc_auc
value: [0.68180088 0.7464782  0.74057681 0.67185418 0.68870169 0.71773272
 0.7324862  0.71287836 0.7184466  0.70873786]

mean value: 0.711969350847135

key: test_jcc
value: [0.47619048 0.61538462 0.69230769 0.42105263 0.52631579 0.64705882
 0.6875     0.61111111 0.625      0.44444444]

mean value: 0.5746365584020383

key: train_jcc
value: [0.59119497 0.58730159 0.64189189 0.59146341 0.58169935 0.6
 0.62328767 0.59589041 0.60544218 0.59459459]

mean value: 0.6012766062443438

MCC on Blind test: 0.45

Accuracy on Blind test: 0.71

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01010966 0.00937247 0.0098629  0.00893044 0.00894618 0.0098803
 0.00925088 0.00918436 0.00915742 0.00916314]

mean value: 0.009385776519775391

key: score_time
value: [0.00906849 0.00885463 0.00871778 0.00924611 0.00887156 0.00888371
 0.00867105 0.00927925 0.00878453 0.00857878]

mean value: 0.008895587921142579

key: test_mcc
value: [0.21969697 0.55048188 0.22407133 0.21452908 0.3030303  0.3030303
 0.33371191 0.39393939 0.09090909 0.32539569]

mean value: 0.29587959510446155

key: train_mcc
value: [0.44146616 0.44911432 0.45709726 0.49637007 0.4861007  0.48652841
 0.43786483 0.44832571 0.49218702 0.50892419]

mean value: 0.4703978666309494

key: test_accuracy
value: [0.60869565 0.73913043 0.60869565 0.60869565 0.65217391 0.65217391
 0.65217391 0.69565217 0.54545455 0.63636364]

mean value: 0.6399209486166008

key: train_accuracy
value: [0.71707317 0.72195122 0.72682927 0.74634146 0.73658537 0.74146341
 0.71707317 0.72195122 0.74271845 0.75242718]

mean value: 0.7324413923750888

key: test_fscore
value: [0.60869565 0.625      0.47058824 0.52631579 0.66666667 0.66666667
 0.6        0.69565217 0.54545455 0.5       ]

mean value: 0.5905039729642637

key: train_fscore
value: [0.69148936 0.70157068 0.71134021 0.73195876 0.7        0.72251309
 0.69473684 0.6984127  0.71957672 0.7357513 ]

mean value: 0.7107349655839269

key: test_precision
value: [0.58333333 1.         0.66666667 0.625      0.66666667 0.66666667
 0.75       0.72727273 0.54545455 0.8       ]

mean value: 0.7031060606060606

key: train_precision
value: [0.76470588 0.76136364 0.75824176 0.78021978 0.80769231 0.7752809
 0.75       0.75862069 0.79069767 0.78888889]

mean value: 0.7735711516709494

key: test_recall
value: [0.63636364 0.45454545 0.36363636 0.45454545 0.66666667 0.66666667
 0.5        0.66666667 0.54545455 0.36363636]

mean value: 0.5318181818181817

key: train_recall
value: [0.63106796 0.65048544 0.66990291 0.68932039 0.61764706 0.67647059
 0.64705882 0.64705882 0.66019417 0.68932039]

mean value: 0.657852655625357

key: test_roc_auc
value: [0.60984848 0.72727273 0.59848485 0.60227273 0.65151515 0.65151515
 0.65909091 0.6969697  0.54545455 0.63636364]

mean value: 0.6378787878787878

key: train_roc_auc
value: [0.71749476 0.72230154 0.72710832 0.74662098 0.736008   0.74114792
 0.7167333  0.72158766 0.74271845 0.75242718]

mean value: 0.732414810584428

key: test_jcc
value: [0.4375     0.45454545 0.30769231 0.35714286 0.5        0.5
 0.42857143 0.53333333 0.375      0.33333333]

mean value: 0.42271187146187145

key: train_jcc
value: [0.52845528 0.54032258 0.552      0.57723577 0.53846154 0.56557377
 0.53225806 0.53658537 0.56198347 0.58196721]

mean value: 0.5514843061067994

MCC on Blind test: 0.26

Accuracy on Blind test: 0.63

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.0090549  0.00948048 0.00953913 0.00963783 0.00954986 0.00989366
 0.00965118 0.00976753 0.00973725 0.00958776]

mean value: 0.009589958190917968

key: score_time
value: [0.01488662 0.01073503 0.01059341 0.01062608 0.01254487 0.01061916
 0.01115012 0.01079106 0.01064491 0.01087499]

mean value: 0.01134662628173828

key: test_mcc
value: [ 0.04545455  0.03178209 -0.06579517  0.30240737  0.15096491 -0.31298622
  0.13740858  0.31252706  0.18898224 -0.09245003]

mean value: 0.06982953647576104

key: train_mcc
value: [0.58048549 0.45409531 0.47798272 0.45409531 0.51440766 0.52244835
 0.48780456 0.43416169 0.52548679 0.56526885]

mean value: 0.5016236722758309

key: test_accuracy
value: [0.52173913 0.52173913 0.47826087 0.65217391 0.56521739 0.34782609
 0.56521739 0.65217391 0.59090909 0.45454545]

mean value: 0.5349802371541502

key: train_accuracy
value: [0.7902439  0.72682927 0.73658537 0.72682927 0.75609756 0.76097561
 0.74146341 0.71707317 0.76213592 0.7815534 ]

mean value: 0.7499786881363959

key: test_fscore
value: [0.52173913 0.42105263 0.33333333 0.6        0.5        0.4
 0.54545455 0.71428571 0.52631579 0.4       ]

mean value: 0.4962181144561007

key: train_fscore
value: [0.79227053 0.72277228 0.71875    0.72277228 0.74226804 0.75376884
 0.71957672 0.71287129 0.75376884 0.7715736 ]

mean value: 0.7410392426302083

key: test_precision
value: [0.5        0.5        0.42857143 0.66666667 0.625      0.38461538
 0.6        0.625      0.625      0.44444444]

mean value: 0.5399297924297924

key: train_precision
value: [0.78846154 0.73737374 0.7752809  0.73737374 0.7826087  0.77319588
 0.7816092  0.72       0.78125    0.80851064]

mean value: 0.7685664317726423

key: test_recall
value: [0.54545455 0.36363636 0.27272727 0.54545455 0.41666667 0.41666667
 0.5        0.83333333 0.45454545 0.36363636]

mean value: 0.4712121212121212

key: train_recall
value: [0.7961165  0.70873786 0.66990291 0.70873786 0.70588235 0.73529412
 0.66666667 0.70588235 0.72815534 0.73786408]

mean value: 0.7163240053302875

key: test_roc_auc
value: [0.52272727 0.51515152 0.46969697 0.64772727 0.5719697  0.34469697
 0.56818182 0.64393939 0.59090909 0.45454545]

mean value: 0.5329545454545455

key: train_roc_auc
value: [0.79021512 0.72691795 0.73691224 0.72691795 0.7558538  0.76085094
 0.74110032 0.71701885 0.76213592 0.7815534 ]

mean value: 0.7499476489624977

key: test_jcc
value: [0.35294118 0.26666667 0.2        0.42857143 0.33333333 0.25
 0.375      0.55555556 0.35714286 0.25      ]

mean value: 0.33692110177404294

key: train_jcc
value: [0.656      0.56589147 0.56097561 0.56589147 0.59016393 0.60483871
 0.56198347 0.55384615 0.60483871 0.62809917]

mean value: 0.5892528707747853

MCC on Blind test: 0.13

Accuracy on Blind test: 0.56

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01453853 0.01178885 0.01168919 0.01257873 0.01363349 0.01344967
 0.01352501 0.01355839 0.0126698  0.01276493]

mean value: 0.013019657135009766

key: score_time
value: [0.01078439 0.00963712 0.00947428 0.01007676 0.01063275 0.01030612
 0.01044083 0.01055479 0.01006365 0.00953841]

mean value: 0.010150909423828125

key: test_mcc
value: [0.31298622 0.74242424 0.50168817 0.12878788 0.66414149 0.30240737
 0.38932432 0.65151515 0.27272727 0.27272727]

mean value: 0.42387293810042725

key: train_mcc
value: [0.72894414 0.76647632 0.80552394 0.73821604 0.76638754 0.78600013
 0.77647587 0.7954287  0.71848046 0.738735  ]

mean value: 0.7620668135550838

key: test_accuracy
value: [0.65217391 0.86956522 0.73913043 0.56521739 0.82608696 0.65217391
 0.69565217 0.82608696 0.63636364 0.63636364]

mean value: 0.7098814229249012

key: train_accuracy
value: [0.86341463 0.88292683 0.90243902 0.86829268 0.88292683 0.89268293
 0.88780488 0.89756098 0.8592233  0.86893204]

mean value: 0.880620412029363

key: test_fscore
value: [0.66666667 0.86956522 0.66666667 0.54545455 0.81818182 0.69230769
 0.72       0.83333333 0.63636364 0.63636364]

mean value: 0.70849032127293

key: train_fscore
value: [0.86915888 0.88118812 0.9009901  0.87323944 0.88       0.89423077
 0.88442211 0.89552239 0.85853659 0.87203791]

mean value: 0.8809326300847204

key: test_precision
value: [0.61538462 0.83333333 0.85714286 0.54545455 0.9        0.64285714
 0.69230769 0.83333333 0.63636364 0.63636364]

mean value: 0.7192540792540792

key: train_precision
value: [0.83783784 0.8989899  0.91919192 0.84545455 0.89795918 0.87735849
 0.90721649 0.90909091 0.8627451  0.85185185]

mean value: 0.8807696229541047

key: test_recall
value: [0.72727273 0.90909091 0.54545455 0.54545455 0.75       0.75
 0.75       0.83333333 0.63636364 0.63636364]

mean value: 0.7083333333333334

key: train_recall
value: [0.90291262 0.86407767 0.88349515 0.90291262 0.8627451  0.91176471
 0.8627451  0.88235294 0.85436893 0.89320388]

mean value: 0.8820578716923663

key: test_roc_auc
value: [0.65530303 0.87121212 0.73106061 0.56439394 0.82954545 0.64772727
 0.69318182 0.82575758 0.63636364 0.63636364]

mean value: 0.709090909090909

key: train_roc_auc
value: [0.86322102 0.88301923 0.90253189 0.86812298 0.88282886 0.89277556
 0.88768323 0.89748715 0.8592233  0.86893204]

mean value: 0.8805825242718447

key: test_jcc
value: [0.5        0.76923077 0.5        0.375      0.69230769 0.52941176
 0.5625     0.71428571 0.46666667 0.46666667]

mean value: 0.5576069273863391

key: train_jcc
value: [0.76859504 0.78761062 0.81981982 0.775      0.78571429 0.80869565
 0.79279279 0.81081081 0.75213675 0.77310924]

mean value: 0.7874285017937194

MCC on Blind test: 0.46

Accuracy on Blind test: 0.73

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.32719898 0.17454052 0.83057833 0.54179835 0.57541323 0.82422447
 0.67535782 0.35218549 0.46702242 1.07017779]

mean value: 0.5838497400283813

key: score_time
value: [0.01227832 0.01222968 0.01220989 0.01225519 0.01225281 0.01271915
 0.01260519 0.0125823  0.01266718 0.0126586 ]

mean value: 0.012445831298828125

key: test_mcc
value: [0.44411739 0.12878788 0.58002308 0.30240737 0.56879646 0.56490196
 0.65909298 0.50168817 0.13245324 0.46225016]

mean value: 0.43445186796951096

key: train_mcc
value: [0.52539178 0.50494514 0.92351163 0.73838965 0.65067908 0.85702512
 0.79260855 0.58203168 0.58157543 0.93243443]

mean value: 0.7088592486777825

key: test_accuracy
value: [0.69565217 0.56521739 0.7826087  0.65217391 0.73913043 0.7826087
 0.82608696 0.73913043 0.54545455 0.72727273]

mean value: 0.7055335968379447

key: train_accuracy
value: [0.74634146 0.74634146 0.96097561 0.86829268 0.80487805 0.92682927
 0.88780488 0.7804878  0.75728155 0.96601942]

mean value: 0.8445252190385981

key: test_fscore
value: [0.74074074 0.54545455 0.73684211 0.6        0.66666667 0.8
 0.84615385 0.78571429 0.28571429 0.7       ]

mean value: 0.6707286475707528

key: train_fscore
value: [0.78512397 0.7173913  0.96226415 0.86432161 0.76190476 0.92957746
 0.89777778 0.80519481 0.6835443  0.96650718]

mean value: 0.8373607320770611

key: test_precision
value: [0.625      0.54545455 0.875      0.66666667 1.         0.76923077
 0.78571429 0.6875     0.66666667 0.77777778]

mean value: 0.7399010711510712

key: train_precision
value: [0.68345324 0.81481481 0.93577982 0.89583333 0.96969697 0.89189189
 0.82113821 0.72093023 0.98181818 0.95283019]

mean value: 0.8668186878098524

key: test_recall
value: [0.90909091 0.54545455 0.63636364 0.54545455 0.5        0.83333333
 0.91666667 0.91666667 0.18181818 0.63636364]

mean value: 0.6621212121212121

key: train_recall
value: [0.9223301  0.6407767  0.99029126 0.83495146 0.62745098 0.97058824
 0.99019608 0.91176471 0.52427184 0.98058252]

mean value: 0.8393203883495146

key: test_roc_auc
value: [0.70454545 0.56439394 0.77651515 0.64772727 0.75       0.78030303
 0.8219697  0.73106061 0.54545455 0.72727273]

mean value: 0.7049242424242423

key: train_roc_auc
value: [0.74547877 0.74685894 0.96083191 0.86845612 0.80401675 0.92704169
 0.88830192 0.78112507 0.75728155 0.96601942]

mean value: 0.84454121454407

key: test_jcc
value: [0.58823529 0.375      0.58333333 0.42857143 0.5        0.66666667
 0.73333333 0.64705882 0.16666667 0.53846154]

mean value: 0.5227327084680026

key: train_jcc
value: [0.6462585  0.55932203 0.92727273 0.76106195 0.61538462 0.86842105
 0.81451613 0.67391304 0.51923077 0.93518519]

mean value: 0.7320566006417716

MCC on Blind test: 0.31

Accuracy on Blind test: 0.65

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01866984 0.01328373 0.01333165 0.01289773 0.01274443 0.01260185
 0.01383185 0.01282859 0.01323938 0.01250935]

mean value: 0.013593840599060058

key: score_time
value: [0.01174784 0.00923467 0.00871539 0.00866914 0.0085988  0.00864315
 0.00875974 0.00851679 0.00875902 0.00885749]

mean value: 0.009050202369689942

key: test_mcc
value: [0.82575758 0.91605722 0.69084928 0.76764947 0.76764947 0.91666667
 0.74242424 1.         0.91287093 0.75592895]

mean value: 0.8295853811736139

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.91304348 0.95652174 0.82608696 0.86956522 0.86956522 0.95652174
 0.86956522 1.         0.95454545 0.86363636]

mean value: 0.9079051383399209

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.95238095 0.77777778 0.88       0.85714286 0.95652174
 0.86956522 1.         0.95238095 0.84210526]

mean value: 0.8996965668453083

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.90909091 1.         1.         0.78571429 1.         1.
 0.90909091 1.         1.         1.        ]

mean value: 0.9603896103896103

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 0.90909091 0.63636364 1.         0.75       0.91666667
 0.83333333 1.         0.90909091 0.72727273]

mean value: 0.8590909090909091

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91287879 0.95454545 0.81818182 0.875      0.875      0.95833333
 0.87121212 1.         0.95454545 0.86363636]

mean value: 0.9083333333333333

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.90909091 0.63636364 0.78571429 0.75       0.91666667
 0.76923077 1.         0.90909091 0.72727273]

mean value: 0.8236763236763237

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.54

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10137177 0.09971786 0.09696341 0.09599876 0.10285091 0.09974742
 0.10015845 0.10240602 0.10212231 0.09918237]

mean value: 0.10005192756652832

key: score_time
value: [0.01733947 0.0176208  0.01726961 0.01758814 0.01826119 0.01933861
 0.01860666 0.01895905 0.0190897  0.01898289]

mean value: 0.0183056116104126

key: test_mcc
value: [0.74242424 0.91666667 0.65909298 0.39393939 0.74047959 0.56490196
 0.76277007 1.         0.73029674 0.54772256]

mean value: 0.7058294203018629

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.95652174 0.82608696 0.69565217 0.86956522 0.7826087
 0.86956522 1.         0.86363636 0.77272727]

mean value: 0.850592885375494

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.86956522 0.95652174 0.8        0.69565217 0.88       0.8
 0.88888889 1.         0.86956522 0.76190476]

mean value: 0.8522097998619738

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.83333333 0.91666667 0.88888889 0.66666667 0.84615385 0.76923077
 0.8        1.         0.83333333 0.8       ]

mean value: 0.8354273504273504

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90909091 1.         0.72727273 0.72727273 0.91666667 0.83333333
 1.         1.         0.90909091 0.72727273]

mean value: 0.875

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.87121212 0.95833333 0.8219697  0.6969697  0.86742424 0.78030303
 0.86363636 1.         0.86363636 0.77272727]

mean value: 0.8496212121212121

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.76923077 0.91666667 0.66666667 0.53333333 0.78571429 0.66666667
 0.8        1.         0.76923077 0.61538462]

mean value: 0.7522893772893773

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.32

Accuracy on Blind test: 0.64

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01011992 0.01058984 0.01019692 0.00972962 0.01009941 0.01027632
 0.01017356 0.01005864 0.0098815  0.0098474 ]

mean value: 0.010097312927246093

key: score_time
value: [0.00989771 0.00945807 0.00941896 0.00952983 0.00942016 0.00951624
 0.00943208 0.00933743 0.00939727 0.00861669]

mean value: 0.00940244197845459

key: test_mcc
value: [0.47727273 0.82575758 0.56490196 0.30240737 0.44411739 0.66414149
 0.39393939 0.66414149 0.29277002 0.46225016]

mean value: 0.5091699576165252

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.73913043 0.91304348 0.7826087  0.65217391 0.69565217 0.82608696
 0.69565217 0.82608696 0.63636364 0.72727273]

mean value: 0.7494071146245059

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.72727273 0.90909091 0.76190476 0.6        0.63157895 0.81818182
 0.69565217 0.81818182 0.55555556 0.7       ]

mean value: 0.7217418711469055

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.72727273 0.90909091 0.8        0.66666667 0.85714286 0.9
 0.72727273 0.9        0.71428571 0.77777778]

mean value: 0.797950937950938

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.72727273 0.90909091 0.72727273 0.54545455 0.5        0.75
 0.66666667 0.75       0.45454545 0.63636364]

mean value: 0.6666666666666666

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.73863636 0.91287879 0.78030303 0.64772727 0.70454545 0.82954545
 0.6969697  0.82954545 0.63636364 0.72727273]

mean value: 0.7503787878787879

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.57142857 0.83333333 0.61538462 0.42857143 0.46153846 0.69230769
 0.53333333 0.69230769 0.38461538 0.53846154]

mean value: 0.5751282051282052

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.56

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.31818891 1.31016517 1.28691626 1.37142372 1.3797493  1.26361561
 1.31167197 1.2901237  1.27162528 1.29103398]

mean value: 1.3094513893127442

key: score_time
value: [0.09341335 0.08867025 0.09690428 0.09699655 0.09698176 0.08863115
 0.09124899 0.0938971  0.09380794 0.09228921]

mean value: 0.09328405857086182

key: test_mcc
value: [0.58002308 0.91666667 0.91605722 0.47727273 0.76764947 0.65151515
 0.91605722 0.91666667 0.81818182 0.83205029]

mean value: 0.7792140323001392

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.7826087  0.95652174 0.95652174 0.73913043 0.86956522 0.82608696
 0.95652174 0.95652174 0.90909091 0.90909091]

mean value: 0.8861660079051383

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.73684211 0.95652174 0.95238095 0.72727273 0.85714286 0.83333333
 0.96       0.95652174 0.90909091 0.9       ]

mean value: 0.8789106362744806

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.91666667 1.         0.72727273 1.         0.83333333
 0.92307692 1.         0.90909091 1.        ]

mean value: 0.918444055944056

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 1.         0.90909091 0.72727273 0.75       0.83333333
 1.         0.91666667 0.90909091 0.81818182]

mean value: 0.85

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.77651515 0.95833333 0.95454545 0.73863636 0.875      0.82575758
 0.95454545 0.95833333 0.90909091 0.90909091]

mean value: 0.8859848484848485

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.58333333 0.91666667 0.90909091 0.57142857 0.75       0.71428571
 0.92307692 0.91666667 0.83333333 0.81818182]

mean value: 0.7936063936063936

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.28

Accuracy on Blind test: 0.62

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.92155814 0.86642456 0.9216938  0.91805387 0.88870335 0.92368603
 0.926301   0.83419299 0.89840913 0.83216953]

mean value: 0.8931192398071289

key: score_time
value: [0.24260831 0.20261288 0.24647403 0.1979568  0.2496202  0.20900178
 0.22739148 0.21628428 0.12800908 0.18758345]

mean value: 0.21075422763824464

key: test_mcc
value: [0.56490196 0.83971912 0.82575758 0.47727273 0.74242424 0.66414149
 0.65909298 0.65151515 0.64715023 0.63636364]

mean value: 0.6708339110699807

key: train_mcc
value: [0.96097468 0.9516192  0.96170013 0.98048734 0.9707786  0.9707786
 0.95163291 0.94219063 0.94245853 0.9613463 ]

mean value: 0.9593966922193641

key: test_accuracy
value: [0.7826087  0.91304348 0.91304348 0.73913043 0.86956522 0.82608696
 0.82608696 0.82608696 0.81818182 0.81818182]

mean value: 0.833201581027668

key: train_accuracy
value: [0.9804878  0.97560976 0.9804878  0.9902439  0.98536585 0.98536585
 0.97560976 0.97073171 0.97087379 0.98058252]

mean value: 0.9795358749704002

key: test_fscore
value: [0.76190476 0.91666667 0.90909091 0.72727273 0.86956522 0.81818182
 0.84615385 0.83333333 0.83333333 0.81818182]

mean value: 0.8333684431510519

key: train_fscore
value: [0.98058252 0.97607656 0.98095238 0.99029126 0.98536585 0.98536585
 0.97584541 0.97115385 0.97142857 0.98076923]

mean value: 0.9797831488680813

key: test_precision
value: [0.8        0.84615385 0.90909091 0.72727273 0.90909091 0.9
 0.78571429 0.83333333 0.76923077 0.81818182]

mean value: 0.8298068598068599

key: train_precision /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

value: [0.98058252 0.96226415 0.96261682 0.99029126 0.98058252 0.98058252
 0.96190476 0.95283019 0.95327103 0.97142857]

mean value: 0.9696354358374721

key: test_recall
value: [0.72727273 1.         0.90909091 0.72727273 0.83333333 0.75
 0.91666667 0.83333333 0.90909091 0.81818182]

mean value: 0.8424242424242424

key: train_recall
value: [0.98058252 0.99029126 1.         0.99029126 0.99019608 0.99019608
 0.99019608 0.99019608 0.99029126 0.99029126]

mean value: 0.9902531886541024

key: test_roc_auc
value: [0.78030303 0.91666667 0.91287879 0.73863636 0.87121212 0.82954545
 0.8219697  0.82575758 0.81818182 0.81818182]

mean value: 0.8333333333333334

key: train_roc_auc
value: [0.98048734 0.97553779 0.98039216 0.99024367 0.9853893  0.9853893
 0.97568056 0.97082619 0.97087379 0.98058252]

mean value: 0.9795402627070247

key: test_jcc
value: [0.61538462 0.84615385 0.83333333 0.57142857 0.76923077 0.69230769
 0.73333333 0.71428571 0.71428571 0.69230769]

mean value: 0.7182051282051282

key: train_jcc
value: [0.96190476 0.95327103 0.96261682 0.98076923 0.97115385 0.97115385
 0.95283019 0.94392523 0.94444444 0.96226415]

mean value: 0.9604333553160921

MCC on Blind test: 0.38

Accuracy on Blind test: 0.67

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02152944 0.00897455 0.00889492 0.0088706  0.00894356 0.0089643
 0.00907016 0.00900245 0.00898051 0.00887012]

mean value: 0.010210061073303222

key: score_time
value: [0.01050448 0.00858045 0.00870132 0.00862598 0.00875401 0.00850987
 0.00866985 0.00858259 0.00861263 0.00853896]

mean value: 0.008808016777038574

key: test_mcc
value: [0.21969697 0.55048188 0.22407133 0.21452908 0.3030303  0.3030303
 0.33371191 0.39393939 0.09090909 0.32539569]

mean value: 0.29587959510446155

key: train_mcc
value: [0.44146616 0.44911432 0.45709726 0.49637007 0.4861007  0.48652841
 0.43786483 0.44832571 0.49218702 0.50892419]

mean value: 0.4703978666309494

key: test_accuracy
value: [0.60869565 0.73913043 0.60869565 0.60869565 0.65217391 0.65217391
 0.65217391 0.69565217 0.54545455 0.63636364]

mean value: 0.6399209486166008

key: train_accuracy
value: [0.71707317 0.72195122 0.72682927 0.74634146 0.73658537 0.74146341
 0.71707317 0.72195122 0.74271845 0.75242718]

mean value: 0.7324413923750888

key: test_fscore
value: [0.60869565 0.625      0.47058824 0.52631579 0.66666667 0.66666667
 0.6        0.69565217 0.54545455 0.5       ]

mean value: 0.5905039729642637

key: train_fscore
value: [0.69148936 0.70157068 0.71134021 0.73195876 0.7        0.72251309
 0.69473684 0.6984127  0.71957672 0.7357513 ]

mean value: 0.7107349655839269

key: test_precision
value: [0.58333333 1.         0.66666667 0.625      0.66666667 0.66666667
 0.75       0.72727273 0.54545455 0.8       ]

mean value: 0.7031060606060606

key: train_precision
value: [0.76470588 0.76136364 0.75824176 0.78021978 0.80769231 0.7752809
 0.75       0.75862069 0.79069767 0.78888889]

mean value: 0.7735711516709494

key: test_recall
value: [0.63636364 0.45454545 0.36363636 0.45454545 0.66666667 0.66666667
 0.5        0.66666667 0.54545455 0.36363636]

mean value: 0.5318181818181817

key: train_recall
value: [0.63106796 0.65048544 0.66990291 0.68932039 0.61764706 0.67647059
 0.64705882 0.64705882 0.66019417 0.68932039]

mean value: 0.657852655625357

key: test_roc_auc
value: [0.60984848 0.72727273 0.59848485 0.60227273 0.65151515 0.65151515
 0.65909091 0.6969697  0.54545455 0.63636364]

mean value: 0.6378787878787878

key: train_roc_auc
value: [0.71749476 0.72230154 0.72710832 0.74662098 0.736008   0.74114792
 0.7167333  0.72158766 0.74271845 0.75242718]

mean value: 0.732414810584428

key: test_jcc
value: [0.4375     0.45454545 0.30769231 0.35714286 0.5        0.5
 0.42857143 0.53333333 0.375      0.33333333]

mean value: 0.42271187146187145

key: train_jcc
value: [0.52845528 0.54032258 0.552      0.57723577 0.53846154 0.56557377
 0.53225806 0.53658537 0.56198347 0.58196721]

mean value: 0.5514843061067994

MCC on Blind test: 0.26

Accuracy on Blind test: 0.63

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'Z...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.0944972  0.05067468 0.04957175 0.05115271 0.05678248 0.05602765
 0.05684161 0.07069731 0.04913449 0.06043005]

mean value: 0.05958099365234375

key: score_time
value: [0.01044273 0.01050806 0.01055908 0.01056576 0.01026511 0.0102632
 0.01027846 0.01120543 0.0102067  0.01039171]

mean value: 0.010468626022338867

key: test_mcc
value: [0.74047959 1.         0.91605722 0.6992059  0.83971912 0.83971912
 0.91605722 0.91666667 1.         0.91287093]

mean value: 0.8780775779868542

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 1.         0.95652174 0.82608696 0.91304348 0.91304348
 0.95652174 0.95652174 1.         0.95454545]

mean value: 0.9345849802371542

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 1.         0.95238095 0.84615385 0.90909091 0.90909091
 0.96       0.95652174 1.         0.95238095]

mean value: 0.9342762165370861

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        1.         1.         0.73333333 1.         1.
 0.92307692 1.         1.         1.        ]

mean value: 0.9556410256410256

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 1.         0.90909091 1.         0.83333333 0.83333333
 1.         0.91666667 1.         0.90909091]

mean value: 0.921969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 1.         0.95454545 0.83333333 0.91666667 0.91666667
 0.95454545 0.95833333 1.         0.95454545]

mean value: 0.9356060606060607

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       1.         0.90909091 0.73333333 0.83333333 0.83333333
 0.92307692 0.91666667 1.         0.90909091]

mean value: 0.8807925407925408

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.53

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.0328002  0.05474067 0.06001735 0.05796957 0.03843188 0.02583218
 0.05591583 0.06007648 0.05593419 0.05865741]

mean value: 0.050037574768066403

key: score_time
value: [0.02213311 0.02232766 0.02493095 0.0224731  0.01203299 0.01206088
 0.02462769 0.0249722  0.02335501 0.01984525]

mean value: 0.020875883102416993

key: test_mcc
value: [0.56490196 0.58002308 0.91666667 0.47727273 0.5164589  0.48856385
 0.56490196 0.58930667 0.63636364 0.45454545]

mean value: 0.5789004892930631

key: train_mcc
value: [0.91223227 0.96097468 0.91223227 0.93174679 0.97115114 0.95126131
 0.95163291 0.93175328 0.94174757 0.94192516]

mean value: 0.9406657392104807

key: test_accuracy
value: [0.7826087  0.7826087  0.95652174 0.73913043 0.73913043 0.73913043
 0.7826087  0.7826087  0.81818182 0.72727273]

mean value: 0.7849802371541502

key: train_accuracy
value: [0.95609756 0.9804878  0.95609756 0.96585366 0.98536585 0.97560976
 0.97560976 0.96585366 0.97087379 0.97087379]

mean value: 0.9702723182571631

key: test_fscore
value: [0.76190476 0.73684211 0.95652174 0.72727273 0.7        0.72727273
 0.8        0.76190476 0.81818182 0.72727273]

mean value: 0.7717173368203116

key: train_fscore
value: [0.95652174 0.98058252 0.95652174 0.96618357 0.98550725 0.97536946
 0.97584541 0.96585366 0.97087379 0.97115385]

mean value: 0.9704412983643049

key: test_precision
value: [0.8        0.875      0.91666667 0.72727273 0.875      0.8
 0.76923077 0.88888889 0.81818182 0.72727273]

mean value: 0.8197513597513597

key: train_precision
value: [0.95192308 0.98058252 0.95192308 0.96153846 0.97142857 0.98019802
 0.96190476 0.96116505 0.97087379 0.96190476]

mean value: 0.9653442089647992

key: test_recall
value: [0.72727273 0.63636364 1.         0.72727273 0.58333333 0.66666667
 0.83333333 0.66666667 0.81818182 0.72727273]

mean value: 0.7386363636363636

key: train_recall
value: [0.96116505 0.98058252 0.96116505 0.97087379 1.         0.97058824
 0.99019608 0.97058824 0.97087379 0.98058252]

mean value: 0.975661526746621

key: test_roc_auc
value: [0.78030303 0.77651515 0.95833333 0.73863636 0.74621212 0.74242424
 0.78030303 0.78787879 0.81818182 0.72727273]

mean value: 0.7856060606060605

key: train_roc_auc
value: [0.95607272 0.98048734 0.95607272 0.96582905 0.98543689 0.97558538
 0.97568056 0.96587664 0.97087379 0.97087379]

mean value: 0.970278888254331

key: test_jcc
value: [0.61538462 0.58333333 0.91666667 0.57142857 0.53846154 0.57142857
 0.66666667 0.61538462 0.69230769 0.57142857]

mean value: 0.6342490842490842

key: train_jcc
value: [0.91666667 0.96190476 0.91666667 0.93457944 0.97142857 0.95192308
 0.95283019 0.93396226 0.94339623 0.94392523]

mean value: 0.9427283095732223

MCC on Blind test: 0.13

Accuracy on Blind test: 0.56

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01869988 0.0104475  0.01018763 0.01002216 0.00952911 0.01009011
 0.01007676 0.01010776 0.00896454 0.01016378]

mean value: 0.010828924179077149

key: score_time
value: [0.00920248 0.00986409 0.00962067 0.0095036  0.00955296 0.00947595
 0.00958252 0.00949979 0.00944614 0.00946665]

mean value: 0.009521484375

key: test_mcc
value: [0.06579517 0.47727273 0.56490196 0.21969697 0.22407133 0.39727608
 0.56818182 0.38932432 0.54772256 0.46225016]

mean value: 0.3916493092720405

key: train_mcc
value: [0.48780456 0.42066716 0.48336719 0.46806514 0.42940367 0.42714207
 0.40668817 0.42940367 0.41216105 0.42138641]

mean value: 0.43860890996498425

key: test_accuracy
value: [0.52173913 0.73913043 0.7826087  0.60869565 0.60869565 0.69565217
 0.7826087  0.69565217 0.77272727 0.72727273]

mean value: 0.6934782608695652

key: train_accuracy
value: [0.74146341 0.70731707 0.74146341 0.73170732 0.71219512 0.71219512
 0.70243902 0.71219512 0.7038835  0.70873786]

mean value: 0.7173596968979399

key: test_fscore
value: [0.59259259 0.72727273 0.76190476 0.60869565 0.68965517 0.74074074
 0.7826087  0.72       0.7826087  0.75      ]

mean value: 0.7156079038402876

key: train_fscore
value: [0.760181   0.73214286 0.74881517 0.75113122 0.73059361 0.7255814
 0.71361502 0.73059361 0.7239819  0.72727273]

mean value: 0.7343908501374309

key: test_precision
value: [0.5        0.72727273 0.8        0.58333333 0.58823529 0.66666667
 0.81818182 0.69230769 0.75       0.69230769]

mean value: 0.6818305224187577

key: train_precision
value: [0.71186441 0.67768595 0.73148148 0.70338983 0.68376068 0.69026549
 0.68468468 0.68376068 0.6779661  0.68376068]

mean value: 0.6928619993570155

key: test_recall
value: [0.72727273 0.72727273 0.72727273 0.63636364 0.83333333 0.83333333
 0.75       0.75       0.81818182 0.81818182]

mean value: 0.7621212121212122

key: train_recall
value: [0.81553398 0.7961165  0.76699029 0.80582524 0.78431373 0.76470588
 0.74509804 0.78431373 0.77669903 0.77669903]

mean value: 0.7816295450218922

key: test_roc_auc
value: [0.53030303 0.73863636 0.78030303 0.60984848 0.59848485 0.68939394
 0.78409091 0.69318182 0.77272727 0.72727273]

mean value: 0.6924242424242424

key: train_roc_auc
value: [0.74110032 0.70688178 0.74133828 0.73134399 0.71254521 0.71245003
 0.70264611 0.71254521 0.7038835  0.70873786]

mean value: 0.7173472301541977

key: test_jcc
value: [0.42105263 0.57142857 0.61538462 0.4375     0.52631579 0.58823529
 0.64285714 0.5625     0.64285714 0.6       ]

mean value: 0.5608131187697751

key: train_jcc
value: [0.61313869 0.57746479 0.59848485 0.60144928 0.57553957 0.56934307
 0.55474453 0.57553957 0.56737589 0.57142857]

mean value: 0.5804508784595866

MCC on Blind test: 0.41

Accuracy on Blind test: 0.7

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01363993 0.01537251 0.01528311 0.01538348 0.01859713 0.01800203
 0.01694918 0.01621366 0.01610088 0.01708269]

mean value: 0.016262459754943847

key: score_time
value: [0.00964856 0.0117321  0.01158285 0.01170731 0.01168036 0.01170659
 0.01175737 0.0116291  0.01162767 0.01166821]

mean value: 0.011474013328552246

key: test_mcc
value: [0.69084928 0.22268089 0.50168817 0.31252706 0.50460839 0.82575758
 0.83971912 0.74047959 0.39735971 0.54232614]

mean value: 0.5577995920530833

key: train_mcc
value: [0.70109302 0.51269395 0.79525817 0.73218681 0.58583388 0.88020643
 0.75526392 0.86303792 0.57361333 0.82977382]

mean value: 0.7228961254133855

key: test_accuracy
value: [0.82608696 0.56521739 0.73913043 0.65217391 0.69565217 0.91304348
 0.91304348 0.86956522 0.63636364 0.72727273]

mean value: 0.7537549407114624

key: train_accuracy
value: [0.82926829 0.70731707 0.89268293 0.84878049 0.75609756 0.93658537
 0.86341463 0.92682927 0.74757282 0.90776699]

mean value: 0.841631541558134

key: test_fscore
value: [0.77777778 0.16666667 0.66666667 0.55555556 0.58823529 0.91666667
 0.90909091 0.88       0.42857143 0.625     ]

mean value: 0.6514230965113318

key: train_fscore
value: [0.79532164 0.5890411  0.88421053 0.82285714 0.67532468 0.93193717
 0.84090909 0.93150685 0.66233766 0.89839572]

mean value: 0.8031841575076744

key: test_precision
value: [1.         1.         0.85714286 0.71428571 1.         0.91666667
 1.         0.84615385 1.         1.        ]

mean value: 0.9334249084249084

key: train_precision
value: [1.         1.         0.96551724 1.         1.         1.
 1.         0.87179487 1.         1.        ]

mean value: 0.9837312113174183

key: test_recall
value: [0.63636364 0.09090909 0.54545455 0.45454545 0.41666667 0.91666667
 0.83333333 0.91666667 0.27272727 0.45454545]

mean value: 0.5537878787878788

key: train_recall
value: [0.66019417 0.41747573 0.81553398 0.69902913 0.50980392 0.87254902
 0.7254902  1.         0.49514563 0.81553398]

mean value: 0.7010755758614126

key: test_roc_auc
value: [0.81818182 0.54545455 0.73106061 0.64393939 0.70833333 0.91287879
 0.91666667 0.86742424 0.63636364 0.72727273]

mean value: 0.7507575757575757

key: train_roc_auc
value: [0.83009709 0.70873786 0.89306111 0.84951456 0.75490196 0.93627451
 0.8627451  0.92718447 0.74757282 0.90776699]

mean value: 0.841785646297354

key: test_jcc
value: [0.63636364 0.09090909 0.5        0.38461538 0.41666667 0.84615385
 0.83333333 0.78571429 0.27272727 0.45454545]

mean value: 0.5221028971028971

key: train_jcc
value: [0.66019417 0.41747573 0.79245283 0.69902913 0.50980392 0.87254902
 0.7254902  0.87179487 0.49514563 0.81553398]

mean value: 0.6859469480015152

MCC on Blind test: 0.18

Accuracy on Blind test: 0.59

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01696992 0.01531029 0.01493359 0.01531291 0.01478624 0.01416588
 0.01495624 0.01490998 0.01478863 0.01566005]

mean value: 0.01517937183380127

key: score_time
value: [0.01180148 0.01169944 0.01167941 0.01164746 0.01165843 0.01172638
 0.01163912 0.01172733 0.01157904 0.01167846]

mean value: 0.01168365478515625

key: test_mcc
value: [0.39393939 0.6992059  0.32232919 0.56879646 0.76764947 0.82575758
 0.76764947 0.91666667 0.64715023 0.23570226]

mean value: 0.6144846616230837

key: train_mcc
value: [0.87817847 0.81217608 0.3623663  0.70796649 0.92194936 0.86485629
 0.66933669 0.8742382  0.85045167 0.56613852]

mean value: 0.7507658057776959

key: test_accuracy
value: [0.69565217 0.82608696 0.60869565 0.73913043 0.86956522 0.91304348
 0.86956522 0.95652174 0.81818182 0.59090909]

mean value: 0.7887351778656126

key: train_accuracy
value: [0.93658537 0.89756098 0.61463415 0.83414634 0.96097561 0.93170732
 0.8097561  0.93658537 0.9223301  0.74271845]

mean value: 0.8586999763201516

key: test_fscore
value: [0.69565217 0.84615385 0.30769231 0.78571429 0.85714286 0.91666667
 0.85714286 0.95652174 0.8        0.68965517]

mean value: 0.7712341905970092

key: train_fscore
value: [0.94009217 0.90748899 0.37795276 0.85833333 0.96078431 0.92929293
 0.76363636 0.93779904 0.91752577 0.7953668 ]

mean value: 0.8388272460201259

key: test_precision
value: [0.66666667 0.73333333 1.         0.64705882 1.         0.91666667
 1.         1.         0.88888889 0.55555556]

mean value: 0.8408169934640523

key: train_precision
value: [0.89473684 0.83064516 1.         0.75182482 0.96078431 0.95833333
 1.         0.91588785 0.97802198 0.66025641]

mean value: 0.8950490706718336

key: test_recall
value: [0.72727273 1.         0.18181818 1.         0.75       0.91666667
 0.75       0.91666667 0.72727273 0.90909091]

mean value: 0.7878787878787878

key: train_recall
value: [0.99029126 1.         0.23300971 1.         0.96078431 0.90196078
 0.61764706 0.96078431 0.86407767 1.        ]

mean value: 0.8528555111364935

key: test_roc_auc
value: [0.6969697  0.83333333 0.59090909 0.75       0.875      0.91287879
 0.875      0.95833333 0.81818182 0.59090909]

mean value: 0.7901515151515152

key: train_roc_auc
value: [0.9363221  0.89705882 0.61650485 0.83333333 0.96097468 0.93156292
 0.80882353 0.93670284 0.9223301  0.74271845]

mean value: 0.8586331620026652

key: test_jcc
value: [0.53333333 0.73333333 0.18181818 0.64705882 0.75       0.84615385
 0.75       0.91666667 0.66666667 0.52631579]

mean value: 0.6551346640975124

key: train_jcc
value: [0.88695652 0.83064516 0.23300971 0.75182482 0.9245283  0.86792453
 0.61764706 0.88288288 0.84761905 0.66025641]

mean value: 0.7503294439056115

MCC on Blind test: 0.2

Accuracy on Blind test: 0.6

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.12839508 0.113796   0.11646557 0.11455917 0.1184082  0.11943507
 0.11837411 0.11078691 0.11046553 0.11171436]

mean value: 0.11624000072479249

key: score_time
value: [0.01480055 0.01611924 0.01634765 0.01499295 0.01620007 0.01611018
 0.01495361 0.0148952  0.01476741 0.01726556]

mean value: 0.01564524173736572

key: test_mcc
value: [0.74047959 0.82575758 0.91605722 0.66414149 0.83971912 0.91666667
 0.91605722 0.83971912 0.81818182 0.91287093]

mean value: 0.8389650763028634

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.86956522 0.91304348 0.95652174 0.82608696 0.91304348 0.95652174
 0.95652174 0.91304348 0.90909091 0.95454545]

mean value: 0.916798418972332

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.90909091 0.95238095 0.83333333 0.90909091 0.95652174
 0.96       0.90909091 0.90909091 0.95238095]

mean value: 0.9148123470732166

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.9        0.90909091 1.         0.76923077 1.         1.
 0.92307692 1.         0.90909091 1.        ]

mean value: 0.941048951048951

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81818182 0.90909091 0.90909091 0.90909091 0.83333333 0.91666667
 1.         0.83333333 0.90909091 0.90909091]

mean value: 0.8946969696969697

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.86742424 0.91287879 0.95454545 0.82954545 0.91666667 0.95833333
 0.95454545 0.91666667 0.90909091 0.95454545]

mean value: 0.9174242424242425

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.83333333 0.90909091 0.71428571 0.83333333 0.91666667
 0.92307692 0.83333333 0.83333333 0.90909091]

mean value: 0.8455544455544456

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.01

Accuracy on Blind test: 0.5

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04284787 0.04148149 0.05290413 0.04513884 0.03626966 0.05176854
 0.04080129 0.04704714 0.04275584 0.04380989]

mean value: 0.04448246955871582

key: score_time
value: [0.01655555 0.02902532 0.01787877 0.02407384 0.01867771 0.02897787
 0.01784182 0.03749013 0.01835752 0.02550364]

mean value: 0.023438215255737305

key: test_mcc
value: [0.74047959 0.83743579 0.91605722 0.58930667 0.76764947 0.83971912
 0.91605722 1.         1.         0.81818182]

mean value: 0.8424886910191745

key: train_mcc
value: [0.98067587 0.98067587 1.         1.         1.         1.
 0.99029126 0.99029034 0.99033794 0.99033794]

mean value: 0.9922609226032173

key: test_accuracy
value: [0.86956522 0.91304348 0.95652174 0.7826087  0.86956522 0.91304348
 0.95652174 1.         1.         0.90909091]

mean value: 0.9169960474308301

key: train_accuracy
value: [0.9902439  0.9902439  1.         1.         1.         1.
 0.99512195 0.99512195 0.99514563 0.99514563]

mean value: 0.9961022969452995

key: test_fscore
value: [0.85714286 0.9        0.95238095 0.8        0.85714286 0.90909091
 0.96       1.         1.         0.90909091]

mean value: 0.9144848484848485

key: train_fscore
value: [0.99019608 0.99019608 1.         1.         1.         1.
 0.99512195 0.99507389 0.99516908 0.99516908]

mean value: 0.9960926163959081

key: test_precision
value: [0.9        1.         1.         0.71428571 1.         1.
 0.92307692 1.         1.         0.90909091]

mean value: 0.9446453546453546

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 0.99029126 1.         0.99038462 0.99038462]

mean value: 0.9971060492905153

key: test_recall
value: [0.81818182 0.81818182 0.90909091 0.90909091 0.75       0.83333333
 1.         1.         1.         0.90909091]

mean value: 0.8946969696969697

key: train_recall
value: [0.98058252 0.98058252 1.         1.         1.         1.
 1.         0.99019608 1.         1.        ]

mean value: 0.9951361126975062

key: test_roc_auc
value: [0.86742424 0.90909091 0.95454545 0.78787879 0.875      0.91666667
 0.95454545 1.         1.         0.90909091]

mean value: 0.9174242424242425

key: train_roc_auc
value: [0.99029126 0.99029126 1.         1.         1.         1.
 0.99514563 0.99509804 0.99514563 0.99514563]

mean value: 0.9961117456691414

key: test_jcc
value: [0.75       0.81818182 0.90909091 0.66666667 0.75       0.83333333
 0.92307692 1.         1.         0.83333333]

mean value: 0.8483682983682984

key: train_jcc
value: [0.98058252 0.98058252 1.         1.         1.         1.
 0.99029126 0.99019608 0.99038462 0.99038462]

mean value: 0.9922421619880215

MCC on Blind test: 0.13

Accuracy on Blind test: 0.55

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.04829431 0.08197236 0.07691717 0.05902553 0.02872419 0.02841234
 0.06640029 0.04778814 0.02781248 0.0365293 ]

mean value: 0.05018761157989502

key: score_time
value: [0.02258968 0.02218485 0.02187991 0.01301789 0.01300526 0.01682043
 0.01924825 0.01270652 0.01270413 0.02137733]

mean value: 0.017553424835205077

key: test_mcc
value: [0.3030303  0.83743579 0.31252706 0.12406456 0.56818182 0.47727273
 0.41096386 0.82575758 0.48795004 0.2773501 ]

mean value: 0.462453382427294

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.65217391 0.91304348 0.65217391 0.56521739 0.7826087  0.73913043
 0.69565217 0.91304348 0.72727273 0.63636364]

mean value: 0.7276679841897233

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.63636364 0.9        0.55555556 0.5        0.7826087  0.75
 0.66666667 0.91666667 0.66666667 0.6       ]

mean value: 0.6974527887571366

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.63636364 1.         0.71428571 0.55555556 0.81818182 0.75
 0.77777778 0.91666667 0.85714286 0.66666667]

mean value: 0.7692640692640693

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.63636364 0.81818182 0.45454545 0.45454545 0.75       0.75
 0.58333333 0.91666667 0.54545455 0.54545455]

mean value: 0.6454545454545455

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.65151515 0.90909091 0.64393939 0.56060606 0.78409091 0.73863636
 0.70075758 0.91287879 0.72727273 0.63636364]

mean value: 0.7265151515151516

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.46666667 0.81818182 0.38461538 0.33333333 0.64285714 0.6
 0.5        0.84615385 0.5        0.42857143]

mean value: 0.5520379620379621

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.22

Accuracy on Blind test: 0.61

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.37749887 0.35624838 0.34953141 0.35143161 0.35564804 0.35179186
 0.34444571 0.35502386 0.34801006 0.35423851]

mean value: 0.35438683032989504

key: score_time
value: [0.00946093 0.00907135 0.00899053 0.00892138 0.00922036 0.00899267
 0.00900412 0.0091064  0.00907969 0.0090704 ]

mean value: 0.009091782569885253

key: test_mcc
value: [0.91666667 1.         0.91605722 0.6992059  0.76764947 1.
 0.91605722 1.         1.         0.91287093]

mean value: 0.912850741785816

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95652174 1.         0.95652174 0.82608696 0.86956522 1.
 0.95652174 1.         1.         0.95454545]

mean value: 0.9519762845849803

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95652174 1.         0.95238095 0.84615385 0.85714286 1.
 0.96       1.         1.         0.95238095]

mean value: 0.9524580347189042

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91666667 1.         1.         0.73333333 1.         1.
 0.92307692 1.         1.         1.        ]

mean value: 0.9573076923076923

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         0.90909091 1.         0.75       1.
 1.         1.         1.         0.90909091]

mean value: 0.9568181818181818

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95833333 1.         0.95454545 0.83333333 0.875      1.
 0.95454545 1.         1.         0.95454545]

mean value: 0.953030303030303

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.91666667 1.         0.90909091 0.73333333 0.75       1.
 0.92307692 1.         1.         0.90909091]

mean value: 0.9141258741258741

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.54

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01773167 0.01976418 0.03433776 0.01997304 0.01990628 0.02007365
 0.02028775 0.0202179  0.02021074 0.0203495 ]

mean value: 0.021285247802734376

key: score_time
value: [0.01196408 0.014189   0.01221085 0.01400971 0.01760888 0.02596092
 0.01817083 0.01999259 0.0198195  0.0188601 ]

mean value: 0.017278647422790526

key: test_mcc
value: [0.63327851 0.83971912 0.76764947 0.43929769 0.76277007 0.62050523
 0.62050523 0.83743579 0.68313005 0.61237244]

mean value: 0.6816663591347039

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.7826087  0.91304348 0.86956522 0.65217391 0.86956522 0.7826087
 0.7826087  0.91304348 0.81818182 0.77272727]

mean value: 0.8156126482213438

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.81481481 0.91666667 0.88       0.73333333 0.88888889 0.82758621
 0.82758621 0.92307692 0.84615385 0.81481481]

mean value: 0.8472921701542391

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.6875     0.84615385 0.78571429 0.57894737 0.8        0.70588235
 0.70588235 0.85714286 0.73333333 0.6875    ]

mean value: 0.7388056396647728

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.79166667 0.91666667 0.875      0.66666667 0.86363636 0.77272727
 0.77272727 0.90909091 0.81818182 0.77272727]

mean value: 0.8159090909090909

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6875     0.84615385 0.78571429 0.57894737 0.8        0.70588235
 0.70588235 0.85714286 0.73333333 0.6875    ]

mean value: 0.7388056396647728

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.51

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.0230031  0.03526139 0.03538561 0.02961564 0.03531909 0.03540421
 0.03581238 0.03637147 0.03165317 0.03724432]

mean value: 0.033507037162780764

key: score_time
value: [0.01678348 0.02226853 0.02215648 0.02117872 0.02216363 0.02219868
 0.02224374 0.02224016 0.02391219 0.02239656]

mean value: 0.02175421714782715

key: test_mcc
value: [0.82575758 0.74242424 0.65909298 0.65151515 0.76764947 0.74047959
 0.82575758 0.82575758 0.73029674 0.46225016]

mean value: 0.7230981074087546

key: train_mcc
value: [0.92263761 0.90259929 0.93209539 0.93209539 0.95236324 0.92213232
 0.92213232 0.903143   0.92250402 0.92302639]

mean value: 0.9234728990174977

key: test_accuracy
value: [0.91304348 0.86956522 0.82608696 0.82608696 0.86956522 0.86956522
 0.91304348 0.91304348 0.86363636 0.72727273]

mean value: 0.8590909090909091

key: train_accuracy
value: [0.96097561 0.95121951 0.96585366 0.96585366 0.97560976 0.96097561
 0.96097561 0.95121951 0.96116505 0.96116505]

mean value: 0.9615013023916646

key: test_fscore
value: [0.90909091 0.86956522 0.8        0.81818182 0.85714286 0.88
 0.91666667 0.91666667 0.86956522 0.7       ]

mean value: 0.8536879352531526

key: train_fscore
value: [0.96190476 0.95192308 0.96650718 0.96650718 0.97607656 0.96116505
 0.96116505 0.95192308 0.96153846 0.96190476]

mean value: 0.9620615145372428

key: test_precision
value: [0.90909091 0.83333333 0.88888889 0.81818182 1.         0.84615385
 0.91666667 0.91666667 0.83333333 0.77777778]

mean value: 0.874009324009324

key: train_precision
value: [0.94392523 0.94285714 0.95283019 0.95283019 0.95327103 0.95192308
 0.95192308 0.93396226 0.95238095 0.94392523]

mean value: 0.9479828385920785

key: test_recall
value: [0.90909091 0.90909091 0.72727273 0.81818182 0.75       0.91666667
 0.91666667 0.91666667 0.90909091 0.63636364]

mean value: 0.8409090909090909

key: train_recall
value: [0.98058252 0.96116505 0.98058252 0.98058252 1.         0.97058824
 0.97058824 0.97058824 0.97087379 0.98058252]
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:195: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:198: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)

mean value: 0.9766133637921188

key: test_roc_auc
value: [0.91287879 0.87121212 0.8219697  0.82575758 0.875      0.86742424
 0.91287879 0.91287879 0.86363636 0.72727273]

mean value: 0.859090909090909

key: train_roc_auc
value: [0.9608795  0.95117076 0.96578146 0.96578146 0.97572816 0.96102227
 0.96102227 0.95131354 0.96116505 0.96116505]

mean value: 0.961502950694841

key: test_jcc
value: [0.83333333 0.76923077 0.66666667 0.69230769 0.75       0.78571429
 0.84615385 0.84615385 0.76923077 0.53846154]

mean value: 0.7497252747252747

key: train_jcc
value: [0.9266055  0.90825688 0.93518519 0.93518519 0.95327103 0.92523364
 0.92523364 0.90825688 0.92592593 0.9266055 ]

mean value: 0.9269759384695507

MCC on Blind test: 0.21

Accuracy on Blind test: 0.6

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
       'kd_values',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=166)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.31841111 0.24217796 0.23903775 0.24415827 0.23731065 0.22925878
 0.26342988 0.2629621  0.31826568 0.26236701]

mean value: 0.26173791885375974

key: score_time
value: [0.02236819 0.02346969 0.02272415 0.02223444 0.0223434  0.02419353
 0.02253866 0.025352   0.02559161 0.02376676]

mean value: 0.023458242416381836

key: test_mcc
value: [0.65151515 0.56490196 0.65909298 0.65151515 0.66414149 0.74047959
 0.74242424 0.82575758 0.63636364 0.36514837]

mean value: 0.6501340144993133

key: train_mcc
value: [0.92211753 0.92263761 0.93209539 0.93209539 0.9707786  0.92213232
 0.92213232 0.903143   0.95150116 0.94192516]

mean value: 0.932055849012585

key: test_accuracy
value: [0.82608696 0.7826087  0.82608696 0.82608696 0.82608696 0.86956522
 0.86956522 0.91304348 0.81818182 0.68181818]

mean value: 0.8239130434782609

key: train_accuracy
value: [0.96097561 0.96097561 0.96585366 0.96585366 0.98536585 0.96097561
 0.96097561 0.95121951 0.97572816 0.97087379]

mean value: 0.9658797063698792

key: test_fscore
value: [0.81818182 0.76190476 0.8        0.81818182 0.81818182 0.88
 0.86956522 0.91666667 0.81818182 0.66666667]

mean value: 0.8167530585356673

key: train_fscore
value: [0.96153846 0.96190476 0.96650718 0.96650718 0.98536585 0.96116505
 0.96116505 0.95192308 0.97584541 0.97115385]

mean value: 0.9663075861961067

key: test_precision
value: [0.81818182 0.8        0.88888889 0.81818182 0.9        0.84615385
 0.90909091 0.91666667 0.81818182 0.7       ]

mean value: 0.8415345765345765

key: train_precision
value: [0.95238095 0.94392523 0.95283019 0.95283019 0.98058252 0.95192308
 0.95192308 0.93396226 0.97115385 0.96190476]

mean value: 0.9553416113711852

key: test_recall
value: [0.81818182 0.72727273 0.72727273 0.81818182 0.75       0.91666667
 0.83333333 0.91666667 0.81818182 0.63636364]

mean value: 0.7962121212121213

key: train_recall
value: [0.97087379 0.98058252 0.98058252 0.98058252 0.99019608 0.97058824
 0.97058824 0.97058824 0.98058252 0.98058252]

mean value: 0.9775747192080716

key: test_roc_auc
value: [0.82575758 0.78030303 0.8219697  0.82575758 0.82954545 0.86742424
 0.87121212 0.91287879 0.81818182 0.68181818]

mean value: 0.8234848484848485

key: train_roc_auc
value: [0.96092709 0.9608795  0.96578146 0.96578146 0.9853893  0.96102227
 0.96102227 0.95131354 0.97572816 0.97087379]

mean value: 0.965871882733676

key: test_jcc
value: [0.69230769 0.61538462 0.66666667 0.69230769 0.69230769 0.78571429
 0.76923077 0.84615385 0.69230769 0.5       ]

mean value: 0.6952380952380952

key: train_jcc
value: [0.92592593 0.9266055  0.93518519 0.93518519 0.97115385 0.92523364
 0.92523364 0.90825688 0.95283019 0.94392523]

mean value: 0.9349535239814974

MCC on Blind test: 0.11

Accuracy on Blind test: 0.55