LSHTM_analysis/scripts/ml/log_rpob_rt.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_rt.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 1133

PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation

or_mychisq          339
log10_or_mychisq    339
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

Total no. of features for aaindex: 123

No. of numerical features: 169
No. of categorical features: 7

index: 0
ind: 1

Mask count check: True

index: 1
ind: 2

Mask count check: True

index: 2
ind: 3

Mask count check: True
Original Data
 Counter({0: 545, 1: 30}) Data dim: (575, 176)

-------------------------------------------------------------
Successfully split data: REVERSE training
imputed values: training set
actual values: blind test set
Train data size: (575, 176)
Test data size: (557, 176)
y_train numbers: Counter({0: 545, 1: 30})
y_train ratio: 18.166666666666668

y_test_numbers: Counter({0: 282, 1: 275})
y_test ratio: 1.0254545454545454
-------------------------------------------------------------
Simple Random OverSampling
 Counter({0: 545, 1: 545})
(1090, 176)
Simple Random UnderSampling
 Counter({0: 30, 1: 30})
(60, 176)
Simple Combined Over and UnderSampling
 Counter({0: 545, 1: 545})
(1090, 176)
SMOTE_NC OverSampling
 Counter({0: 545, 1: 545})
(1090, 176)

#####################################################################

Running ML analysis: REVERSE training

Gene name: rpoB
Drug name: rifampicin

Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_rt/

Sanity checks:
Total input features: 176

Training data size: (575, 176)
Test data size: (557, 176)

Target feature numbers (training data): Counter({0: 545, 1: 30})
Target features ratio (training data: 18.166666666666668

Target feature numbers (test data): Counter({0: 282, 1: 275})
Target features ratio (test data): 1.0254545454545454

#####################################################################


================================================================

Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

AAindex features (n): 123
These are:
 ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03620291 0.03169894 0.03200722 0.03302526 0.03463435 0.03355765
 0.03698945 0.03275728 0.03071117 0.02976274]

mean value: 0.03313469886779785

key: score_time
value: [0.01243091 0.0119307  0.01226759 0.01270723 0.01222587 0.01198983
 0.01438379 0.01209283 0.01194143 0.01193285]

mean value: 0.01239030361175537

key: test_mcc
value: [0.64848485 0.56713087 0.56713087 0.         0.38251843 0.56694671
 0.2962963  0.38204659 0.80903983 0.64814815]

mean value: 0.4867742597878121

key: train_mcc
value: [0.78551705 0.78551705 0.78589709 0.8321053  0.80909988 0.78553305
 0.76169209 0.80911474 0.78553305 0.80911474]

mean value: 0.7949124038845846

key: test_accuracy
value: [0.96551724 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228
 0.92982456 0.94736842 0.98245614 0.96491228]

mean value: 0.9582577132486388

key: train_accuracy
value: [0.98065764 0.98065764 0.98065764 0.98452611 0.98259188 0.98069498
 0.97876448 0.98262548 0.98069498 0.98262548]

mean value: 0.9814496314496315

key: test_fscore
value: [0.66666667 0.5        0.5        0.         0.4        0.5
 0.33333333 0.4        0.8        0.66666667]

mean value: 0.4766666666666667

key: train_fscore
value: [0.77272727 0.77272727 0.7826087  0.82608696 0.8        0.77272727
 0.75555556 0.8        0.77272727 0.8       ]

mean value: 0.7855160298638559

key: test_precision
value: [0.66666667 1.         1.         0.         0.5        1.
 0.33333333 0.5        1.         0.66666667]

mean value: 0.6666666666666666

key: train_precision
value: [1.         1.         0.94736842 1.         1.         1.
 0.94444444 1.         1.         1.        ]

mean value: 0.9891812865497076

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.33333333 0.33333333
 0.33333333 0.33333333 0.66666667 0.66666667]

mean value: 0.39999999999999997

key: train_recall
value: [0.62962963 0.62962963 0.66666667 0.7037037  0.66666667 0.62962963
 0.62962963 0.66666667 0.62962963 0.66666667]

mean value: 0.6518518518518519

key: test_roc_auc
value: [0.82424242 0.66666667 0.66666667 0.5        0.65757576 0.66666667
 0.64814815 0.65740741 0.83333333 0.82407407]

mean value: 0.6944781144781145

key: train_roc_auc
value: [0.81481481 0.81481481 0.83231293 0.85185185 0.83333333 0.81481481
 0.81379648 0.83333333 0.81481481 0.83333333]

mean value: 0.8257220521157094

key: test_jcc
value: [0.5        0.33333333 0.33333333 0.         0.25       0.33333333
 0.2        0.25       0.66666667 0.5       ]

mean value: 0.33666666666666667

key: train_jcc
value: [0.62962963 0.62962963 0.64285714 0.7037037  0.66666667 0.62962963
 0.60714286 0.66666667 0.62962963 0.66666667]

mean value: 0.6472222222222223

MCC on Blind test: 0.2

Accuracy on Blind test: 0.55

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.88869524 0.74637437 0.73213053 0.80213022 0.75085831 0.79953766
 0.74605346 0.7229135  0.95076203 0.73621321]

mean value: 0.78756685256958

key: score_time
value: [0.01215458 0.01214933 0.01236701 0.01245666 0.01223445 0.01911211
 0.01208758 0.01212478 0.01212263 0.01623821]

mean value: 0.013304734230041504

key: test_mcc
value: [ 0.80917359  0.56713087  0.56713087  0.         -0.03093441  0.56694671
  0.2962963   0.38204659  0.80903983  0.2962963 ]

mean value: 0.4263126653890895

key: train_mcc
value: [0.5982906  0.62811222 0.89810042 0.65669104 0.68418345 0.68420281
 0.65717642 0.62813245 0.62813245 0.98030899]

mean value: 0.7043330842580746

key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.93103448 0.96491228
 0.92982456 0.94736842 0.98245614 0.92982456]

mean value: 0.9547489413188143

key: train_accuracy
value: [0.96711799 0.96905222 0.99032882 0.97098646 0.9729207  0.97297297
 0.97104247 0.96911197 0.96911197 0.9980695 ]

mean value: 0.9750715069864005

key: test_fscore
value: [0.8        0.5        0.5        0.         0.         0.5
 0.33333333 0.4        0.8        0.33333333]

mean value: 0.4166666666666667

key: train_fscore
value: [0.54054054 0.57894737 0.89795918 0.61538462 0.65       0.65
 0.63414634 0.57894737 0.57894737 0.98113208]

mean value: 0.6706004861796896

key: test_precision
value: [1.         1.         1.         0.         0.         1.
 0.33333333 0.5        1.         0.33333333]

mean value: 0.6166666666666667

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 0.92857143 1.         1.         1.        ]

mean value: 0.9928571428571429

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.33333333 0.33333333 0.66666667 0.33333333]

mean value: 0.3333333333333333

key: train_recall
value: [0.37037037 0.40740741 0.81481481 0.44444444 0.48148148 0.48148148
 0.48148148 0.40740741 0.40740741 0.96296296]

mean value: 0.5259259259259259

key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5        0.49090909 0.66666667
 0.64814815 0.65740741 0.83333333 0.64814815]

mean value: 0.6611279461279461

key: train_roc_auc
value: [0.68518519 0.7037037  0.90740741 0.72222222 0.74074074 0.74074074
 0.73972241 0.7037037  0.7037037  0.98148148]

mean value: 0.762861129969073

key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.2        0.25       0.66666667 0.2       ]

mean value: 0.29833333333333334

key: train_jcc
value: [0.37037037 0.40740741 0.81481481 0.44444444 0.48148148 0.48148148
 0.46428571 0.40740741 0.40740741 0.96296296]

mean value: 0.5242063492063492

MCC on Blind test: 0.17

Accuracy on Blind test: 0.54

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01631737 0.01177216 0.01084805 0.01043868 0.0100987  0.0101819
 0.01021171 0.01049137 0.01006961 0.01019645]

mean value: 0.01106259822845459

key: score_time
value: [0.01252532 0.00943542 0.00935078 0.00916195 0.00890803 0.00936007
 0.00899339 0.00960541 0.00887513 0.00927067]

mean value: 0.009548616409301759

key: test_mcc
value: [0.43452409 0.51168172 0.1762953  0.28417826 0.45726459 0.28291258
 0.51099032 0.30441977 0.68718427 0.51099032]

mean value: 0.41604412240331595

key: train_mcc
value: [0.42281221 0.43803842 0.60914572 0.45594583 0.46618385 0.43552025
 0.44078566 0.44401007 0.42989307 0.49523539]

mean value: 0.4637570449954174

key: test_accuracy
value: [0.82758621 0.87931034 0.87931034 0.82758621 0.84482759 0.8245614
 0.87719298 0.84210526 0.94736842 0.87719298]

mean value: 0.8627041742286751

key: train_accuracy
value: [0.82978723 0.84139265 0.92843327 0.86460348 0.86073501 0.83976834
 0.84362934 0.85714286 0.84749035 0.87837838]

mean value: 0.8591360910509847

key: test_fscore
value: [0.375      0.46153846 0.22222222 0.28571429 0.4        0.28571429
 0.46153846 0.30769231 0.66666667 0.46153846]

mean value: 0.39276251526251527

key: train_fscore
value: [0.37142857 0.3880597  0.58426966 0.41666667 0.41935484 0.38518519
 0.39097744 0.40322581 0.3875969  0.45217391]

mean value: 0.4198938688732906

key: test_precision
value: [0.23076923 0.3        0.16666667 0.18181818 0.25       0.18181818
 0.3        0.2        0.5        0.3       ]

mean value: 0.2611072261072261

key: train_precision
value: [0.2300885  0.24299065 0.41935484 0.2688172  0.26804124 0.24074074
 0.24528302 0.25773196 0.24509804 0.29545455]

mean value: 0.2713600732946767

key: test_recall
value: [1.         1.         0.33333333 0.66666667 1.         0.66666667
 1.         0.66666667 1.         1.        ]

mean value: 0.8333333333333334

key: train_recall
value: [0.96296296 0.96296296 0.96296296 0.92592593 0.96296296 0.96296296
 0.96296296 0.92592593 0.92592593 0.96296296]

mean value: 0.9518518518518518

key: test_roc_auc
value: [0.90909091 0.93636364 0.62121212 0.75151515 0.91818182 0.75
 0.93518519 0.75925926 0.97222222 0.93518519]

mean value: 0.8488215488215488

key: train_roc_auc
value: [0.89270597 0.89882842 0.94474679 0.89357521 0.9090325  0.89797843
 0.90001509 0.88964321 0.88455156 0.91834503]

mean value: 0.9029422192049483

key: test_jcc
value: [0.23076923 0.3        0.125      0.16666667 0.25       0.16666667
 0.3        0.18181818 0.5        0.3       ]

mean value: 0.2520920745920746

key: train_jcc
value: [0.22807018 0.24074074 0.41269841 0.26315789 0.26530612 0.23853211
 0.24299065 0.25252525 0.24038462 0.29213483]

mean value: 0.2676540809731464

MCC on Blind test: 0.43

Accuracy on Blind test: 0.69

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01049566 0.01031566 0.01039696 0.01031137 0.01143765 0.01037836
 0.0102942  0.01034856 0.01027584 0.01027036]

mean value: 0.010452461242675782

key: score_time
value: [0.00897431 0.00893593 0.00913882 0.00890827 0.00900865 0.00888872
 0.00890207 0.00890517 0.0088551  0.00887251]

mean value: 0.008938956260681152

key: test_mcc
value: [ 0.5508895   0.56713087  0.38251843 -0.04413674 -0.05454545  0.38204659
 -0.03149704  0.38204659  0.76011695  0.80903983]

mean value: 0.37036095219850856

key: train_mcc
value: [0.58249009 0.5427084  0.62227177 0.55500996 0.60935992 0.62292056
 0.58514128 0.58191044 0.58514128 0.5955795 ]

mean value: 0.5882533192252368

key: test_accuracy
value: [0.94827586 0.96551724 0.94827586 0.9137931  0.89655172 0.94736842
 0.92982456 0.94736842 0.96491228 0.98245614]

mean value: 0.9444343617664852

key: train_accuracy
value: [0.95938104 0.95551257 0.96324952 0.95744681 0.96324952 0.96138996
 0.95752896 0.96138996 0.95752896 0.96138996]

mean value: 0.9598067257641726

key: test_fscore
value: [0.57142857 0.5        0.4        0.         0.         0.4
 0.         0.4        0.75       0.8       ]

mean value: 0.3821428571428572

key: train_fscore /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

value: [0.60377358 0.56603774 0.64150943 0.57692308 0.62745098 0.64285714
 0.60714286 0.6        0.60714286 0.61538462]

mean value: 0.6088222284559688

key: test_precision
value: [0.5 1.  0.5 0.  0.  0.5 0.  0.5 0.6 1. ]

mean value: 0.46

key: train_precision
value: [0.61538462 0.57692308 0.65384615 0.6        0.66666667 0.62068966
 0.5862069  0.65217391 0.5862069  0.64      ]

mean value: 0.6198097874139853

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.         0.33333333 1.         0.66666667]

mean value: 0.36666666666666664

key: train_recall
value: [0.59259259 0.55555556 0.62962963 0.55555556 0.59259259 0.66666667
 0.62962963 0.55555556 0.62962963 0.59259259]

mean value: 0.6

key: test_roc_auc
value: [0.81515152 0.66666667 0.65757576 0.48181818 0.47272727 0.65740741
 0.49074074 0.65740741 0.98148148 0.83333333]

mean value: 0.6714309764309764

key: train_roc_auc
value: [0.78609221 0.76655329 0.80563114 0.7675737  0.78813303 0.8221317
 0.80259486 0.76963114 0.80259486 0.78713133]

mean value: 0.7898067251340455

key: test_jcc
value: [0.4        0.33333333 0.25       0.         0.         0.25
 0.         0.25       0.6        0.66666667]

mean value: 0.275

key: train_jcc
value: [0.43243243 0.39473684 0.47222222 0.40540541 0.45714286 0.47368421
 0.43589744 0.42857143 0.43589744 0.44444444]

mean value: 0.4380434714645241

MCC on Blind test: 0.2

Accuracy on Blind test: 0.56

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00958323 0.01032209 0.01076818 0.01059055 0.01070786 0.0109334
 0.00970054 0.0109117  0.01076508 0.00970888]

mean value: 0.010399150848388671

key: score_time
value: [0.07445121 0.01275253 0.0135901  0.0133009  0.01288939 0.01299
 0.01248527 0.01255918 0.01344275 0.01351595]

mean value: 0.019197726249694826

key: test_mcc
value: [0.80917359 0.56713087 0.56713087 0.         0.         0.
 0.         0.56694671 0.56694671 0.        ]

mean value: 0.30773287583715697

key: train_mcc
value: [0.4990914  0.46161651 0.4990914  0.5982906  0.56702937 0.59831103
 0.59831103 0.53409532 0.53409532 0.53409532]

mean value: 0.5424027293731967

key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.94736842
 0.94736842 0.96491228 0.96491228 0.94736842]

mean value: 0.958227465214761

key: train_accuracy
value: [0.96131528 0.95938104 0.96131528 0.96711799 0.96518375 0.96718147
 0.96718147 0.96332046 0.96332046 0.96332046]

mean value: 0.9638637670552564

key: test_fscore
value: [0.8 0.5 0.5 0.  0.  0.  0.  0.5 0.5 0. ]

mean value: 0.28

key: train_fscore
value: [0.41176471 0.36363636 0.41176471 0.54054054 0.5        0.54054054
 0.54054054 0.45714286 0.45714286 0.45714286]

mean value: 0.46802159684512623

key: test_precision
value: [1. 1. 1. 0. 0. 0. 0. 1. 1. 0.]

mean value: 0.5

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.
 0.         0.33333333 0.33333333 0.        ]

mean value: 0.19999999999999998

key: train_recall
value: [0.25925926 0.22222222 0.25925926 0.37037037 0.33333333 0.37037037
 0.37037037 0.2962963  0.2962963  0.2962963 ]

mean value: 0.3074074074074074

key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5        0.5        0.5
 0.5        0.66666667 0.66666667 0.5       ]

mean value: 0.6

key: train_roc_auc
value: [0.62962963 0.61111111 0.62962963 0.68518519 0.66666667 0.68518519
 0.68518519 0.64814815 0.64814815 0.64814815]

mean value: 0.6537037037037037

key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.
 0.         0.33333333 0.33333333 0.        ]

mean value: 0.19999999999999998

key: train_jcc
value: [0.25925926 0.22222222 0.25925926 0.37037037 0.33333333 0.37037037
 0.37037037 0.2962963  0.2962963  0.2962963 ]

mean value: 0.3074074074074074

MCC on Blind test: 0.12

Accuracy on Blind test: 0.52

Model_name: SVM
Model func: SVC(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01754379 0.01406407 0.01333332 0.01329112 0.01354933 0.0141921
 0.01471233 0.01526451 0.01424551 0.0159626 ]

mean value: 0.014615869522094727

key: score_time
value: [0.01249647 0.00990486 0.0098834  0.00987887 0.0101161  0.00993848
 0.00975871 0.01059246 0.0107305  0.0111804 ]

mean value: 0.010448026657104491

key: test_mcc
value: [ 0.          0.56713087  0.56713087  0.          0.          0.
 -0.03149704  0.56694671  0.          0.56694671]

mean value: 0.22366581252414472

key: train_mcc
value: [0.53407501 0.53407501 0.56702937 0.5982906  0.65669104 0.53409532
 0.62813245 0.62813245 0.53409532 0.53409532]

mean value: 0.5748711879647856

key: test_accuracy
value: [0.94827586 0.96551724 0.96551724 0.94827586 0.94827586 0.94736842
 0.92982456 0.96491228 0.94736842 0.96491228]

mean value: 0.9530248033877798

key: train_accuracy
value: [0.96324952 0.96324952 0.96518375 0.96711799 0.97098646 0.96332046
 0.96911197 0.96911197 0.96332046 0.96332046]

mean value: 0.9657972562227881

key: test_fscore
value: [0.  0.5 0.5 0.  0.  0.  0.  0.5 0.  0.5]

mean value: 0.2

key: train_fscore
value: [0.45714286 0.45714286 0.5        0.54054054 0.61538462 0.45714286
 0.57894737 0.57894737 0.45714286 0.45714286]

mean value: 0.5099534178481546

key: test_precision
value: [0. 1. 1. 0. 0. 0. 0. 1. 0. 1.]

mean value: 0.4

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.         0.33333333 0.33333333 0.         0.         0.
 0.         0.33333333 0.         0.33333333]

mean value: 0.13333333333333333

key: train_recall
value: [0.2962963  0.2962963  0.33333333 0.37037037 0.44444444 0.2962963
 0.40740741 0.40740741 0.2962963  0.2962963 ]

mean value: 0.34444444444444444

key: test_roc_auc
value: [0.5        0.66666667 0.66666667 0.5        0.5        0.5
 0.49074074 0.66666667 0.5        0.66666667]

mean value: 0.5657407407407408

key: train_roc_auc
value: [0.64814815 0.64814815 0.66666667 0.68518519 0.72222222 0.64814815
 0.7037037  0.7037037  0.64814815 0.64814815]

mean value: 0.6722222222222222

key: test_jcc
value: [0.         0.33333333 0.33333333 0.         0.         0.
 0.         0.33333333 0.         0.33333333]

mean value: 0.13333333333333333

key: train_jcc
value: [0.2962963  0.2962963  0.33333333 0.37037037 0.44444444 0.2962963
 0.40740741 0.40740741 0.2962963  0.2962963 ]

mean value: 0.34444444444444444

MCC on Blind test: 0.11

Accuracy on Blind test: 0.52

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.86374784 2.09029508 1.63843369 2.03394628 1.90827298 1.78310966
 1.99839234 2.06304955 1.93170047 1.81367326]

mean value: 1.9124621152877808

key: score_time
value: [0.01261926 0.01259279 0.01266718 0.01298261 0.01273131 0.01226902
 0.01244545 0.01506019 0.01257229 0.01240182]

mean value: 0.01283419132232666

key: test_mcc
value: [0.76038268 0.5508895  0.38251843 0.         0.2969697  0.56694671
 0.68718427 0.38204659 0.80903983 0.38204659]

mean value: 0.4818024290736844

key: train_mcc
value: [1.         1.         1.         1.         1.         0.9609263
 0.98030899 1.         0.9609263  1.        ]

mean value: 0.990216159887024

key: test_accuracy
value: [0.96551724 0.94827586 0.94827586 0.94827586 0.93103448 0.96491228
 0.94736842 0.94736842 0.98245614 0.94736842]

mean value: 0.9530852994555353

key: train_accuracy
value: [1.        1.        1.        1.        1.        0.996139  0.9980695
 1.        0.996139  1.       ]

mean value: 0.9990347490347491

key: test_fscore
value: [0.75       0.57142857 0.4        0.         0.33333333 0.5
 0.66666667 0.4        0.8        0.4       ]

mean value: 0.48214285714285715

key: train_fscore
value: [1.         1.         1.         1.         1.         0.96296296
 0.98113208 1.         0.96296296 1.        ]

mean value: 0.9907058001397624

key: test_precision
value: [0.6        0.5        0.5        0.         0.33333333 1.
 0.5        0.5        1.         0.5       ]

mean value: 0.5433333333333333

key: train_precision
value: [1.         1.         1.         1.         1.         0.96296296
 1.         1.         0.96296296 1.        ]

mean value: 0.9925925925925926

key: test_recall
value: [1.         0.66666667 0.33333333 0.         0.33333333 0.33333333
 1.         0.33333333 0.66666667 0.33333333]

mean value: 0.5

key: train_recall
value: [1.         1.         1.         1.         1.         0.96296296
 0.96296296 1.         0.96296296 1.        ]

mean value: 0.9888888888888889

key: test_roc_auc
value: [0.98181818 0.81515152 0.65757576 0.5        0.64848485 0.66666667
 0.97222222 0.65740741 0.83333333 0.65740741]

mean value: 0.739006734006734

key: train_roc_auc
value: [1.         1.         1.         1.         1.         0.98046315
 0.98148148 1.         0.98046315 1.        ]

mean value: 0.9942407784566644

key: test_jcc
value: [0.6        0.4        0.25       0.         0.2        0.33333333
 0.5        0.25       0.66666667 0.25      ]

mean value: 0.345

key: train_jcc
value: [1.         1.         1.         1.         1.         0.92857143
 0.96296296 1.         0.92857143 1.        ]

mean value: 0.982010582010582

MCC on Blind test: 0.24

Accuracy on Blind test: 0.57

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.02618551 0.02589345 0.02173519 0.02467251 0.02685261 0.02534461
 0.0224154  0.02488923 0.02453709 0.01868343]

mean value: 0.024120903015136717

key: score_time
value: [0.01264358 0.00964165 0.00909233 0.00906181 0.00916147 0.00893998
 0.00907493 0.00904655 0.00892591 0.00914407]

mean value: 0.009473228454589843

key: test_mcc
value: [0.5508895  0.64848485 0.56713087 0.56713087 0.2969697  0.76011695
 0.55039532 0.64814815 0.80903983 0.2962963 ]

mean value: 0.5694602338563187

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94827586 0.96551724 0.96551724 0.96551724 0.93103448 0.96491228
 0.94736842 0.96491228 0.98245614 0.92982456]

mean value: 0.9565335753176043

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.57142857 0.66666667 0.5        0.5        0.33333333 0.75
 0.57142857 0.66666667 0.8        0.33333333]

mean value: 0.5692857142857143

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.5        0.66666667 1.         1.         0.33333333 0.6
 0.5        0.66666667 1.         0.33333333]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

mean value: 0.66

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.33333333 0.33333333 1.
 0.66666667 0.66666667 0.66666667 0.33333333]

mean value: 0.5666666666666667

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.81515152 0.82424242 0.66666667 0.66666667 0.64848485 0.98148148
 0.81481481 0.82407407 0.83333333 0.64814815]

mean value: 0.7723063973063973

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.4        0.5        0.33333333 0.33333333 0.2        0.6
 0.4        0.5        0.66666667 0.2       ]

mean value: 0.41333333333333333

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.25

Accuracy on Blind test: 0.58

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10347772 0.10542083 0.1026392  0.10236788 0.1086688  0.10966682
 0.10839295 0.10804629 0.10971022 0.10941029]

mean value: 0.10678009986877442

key: score_time
value: [0.01768446 0.01904488 0.01783371 0.01776528 0.01919889 0.01932764
 0.01858878 0.0196681  0.01852608 0.01737475]

mean value: 0.01850125789642334

key: test_mcc
value: [0.80917359 0.56713087 0.56713087 0.         0.56713087 0.56694671
 0.38204659 0.38204659 0.80903983 0.64814815]

mean value: 0.5298794082235709

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.96551724 0.96491228
 0.94736842 0.94736842 0.98245614 0.96491228]

mean value: 0.9634603750756201

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8        0.5        0.5        0.         0.5        0.5
 0.4        0.4        0.8        0.66666667]

mean value: 0.5066666666666667

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         1.         0.         1.         1.
 0.5        0.5        1.         0.66666667]

mean value: 0.7666666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.33333333 0.33333333
 0.33333333 0.33333333 0.66666667 0.66666667]

mean value: 0.39999999999999997

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5        0.66666667 0.66666667
 0.65740741 0.65740741 0.83333333 0.82407407]

mean value: 0.6972222222222222

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0.         0.33333333 0.33333333
 0.25       0.25       0.66666667 0.5       ]

mean value: 0.36666666666666664

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.16

Accuracy on Blind test: 0.54

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01089549 0.00997782 0.01006174 0.00988054 0.01028514 0.00988078
 0.01003456 0.00997853 0.01007175 0.01044822]

mean value: 0.01015145778656006

key: score_time
value: [0.0088129  0.0086453  0.00865483 0.00858808 0.00892234 0.00854993
 0.00863981 0.00871801 0.00876451 0.00892496]

mean value: 0.008722066879272461

key: test_mcc
value: [ 0.48301038  0.38251843  0.2969697  -0.04413674 -0.03093441  0.56694671
  0.20464687  0.24282147  0.          0.17516462]

mean value: 0.22770070155826885

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93103448 0.94827586 0.93103448 0.9137931  0.93103448 0.96491228
 0.89473684 0.9122807  0.94736842 0.87719298]

mean value: 0.9251663641863279

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.5        0.4        0.33333333 0.         0.         0.5
 0.25       0.28571429 0.         0.22222222]

mean value: 0.24912698412698414

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.4        0.5        0.33333333 0.         0.         1.
 0.2        0.25       0.         0.16666667]

mean value: 0.285

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.33333333 0.33333333 0.         0.33333333]

mean value: 0.26666666666666666

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.80606061 0.65757576 0.64848485 0.48181818 0.49090909 0.66666667
 0.62962963 0.63888889 0.5        0.62037037]

mean value: 0.614040404040404

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.33333333 0.25       0.2        0.         0.         0.33333333
 0.14285714 0.16666667 0.         0.125     ]

mean value: 0.1551190476190476

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.22

Accuracy on Blind test: 0.57

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.54438972 1.56727147 1.54567933 1.55488396 1.54870415 1.54974079
 1.54354572 1.55656648 1.52714944 1.55768704]

mean value: 1.5495618104934692

key: score_time
value: [0.0920825  0.08973002 0.08945179 0.08959389 0.08978343 0.08988523
 0.08968854 0.08932042 0.09021282 0.08947134]

mean value: 0.08992199897766114

key: test_mcc
value: [0.80917359 0.56713087 0.56713087 0.         0.         0.56694671
 0.38204659 0.38204659 0.56694671 0.80903983]

mean value: 0.46504617707858015

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228
 0.94736842 0.94736842 0.96491228 0.98245614]

mean value: 0.9617362371445856

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.8 0.5 0.5 0.  0.  0.5 0.4 0.4 0.5 0.8]

mean value: 0.44

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.  1.  1.  0.  0.  1.  0.5 0.5 1.  1. ]

mean value: 0.7

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.33333333 0.33333333 0.33333333 0.66666667]

mean value: 0.3333333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5        0.5        0.66666667
 0.65740741 0.65740741 0.66666667 0.83333333]

mean value: 0.6648148148148147

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.25       0.25       0.33333333 0.66666667]

mean value: 0.31666666666666665

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.16

Accuracy on Blind test: 0.54

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.79947305 0.99364781 0.97343969 1.00802183 0.97355986 0.97136927
 0.97683573 0.94764853 0.94387889 0.96559119]

mean value: 1.0553465843200684

key: score_time
value: [0.22360277 0.19479609 0.35481954 0.11889696 0.24737167 0.24671912
 0.25966692 0.24125576 0.23207593 0.26727366]

mean value: 0.23864784240722656

key: test_mcc
value: [0.80917359 0.56713087 0.56713087 0.         0.         0.56694671
 0.38204659 0.         0.         0.56694671]

mean value: 0.34593753471007405

key: train_mcc
value: [0.73639347 0.65669104 0.76130255 0.71071615 0.73639347 0.68420281
 0.65671091 0.68420281 0.68420281 0.76131958]

mean value: 0.7072135583787191

key: test_accuracy
value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228
 0.94736842 0.94736842 0.94736842 0.96491228]

mean value: 0.958227465214761

key: train_accuracy
value: [0.97678917 0.97098646 0.9787234  0.97485493 0.97678917 0.97297297
 0.97104247 0.97297297 0.97297297 0.97876448]

mean value: 0.9746869002188151

key: test_fscore
value: [0.8 0.5 0.5 0.  0.  0.5 0.4 0.  0.  0.5]

mean value: 0.32

key: train_fscore
value: [0.71428571 0.61538462 0.74418605 0.68292683 0.71428571 0.65
 0.61538462 0.65       0.65       0.74418605]

mean value: 0.6780639581632207

key: test_precision
value: [1.  1.  1.  0.  0.  1.  0.5 0.  0.  1. ]

mean value: 0.55

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.33333333 0.         0.         0.33333333]

mean value: 0.2333333333333333

key: train_recall
value: [0.55555556 0.44444444 0.59259259 0.51851852 0.55555556 0.48148148
 0.44444444 0.48148148 0.48148148 0.59259259]

mean value: 0.5148148148148148

key: test_roc_auc
value: [0.83333333 0.66666667 0.66666667 0.5        0.5        0.66666667
 0.65740741 0.5        0.5        0.66666667]

mean value: 0.6157407407407407

key: train_roc_auc
value: [0.77777778 0.72222222 0.7962963  0.75925926 0.77777778 0.74074074
 0.72222222 0.74074074 0.74074074 0.7962963 ]

mean value: 0.7574074074074074

key: test_jcc
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.25       0.         0.         0.33333333]

mean value: 0.22499999999999998

key: train_jcc
value: [0.55555556 0.44444444 0.59259259 0.51851852 0.55555556 0.48148148
 0.44444444 0.48148148 0.48148148 0.59259259]

mean value: 0.5148148148148148

MCC on Blind test: 0.13

Accuracy on Blind test: 0.52

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02531791 0.01029682 0.01034307 0.01032424 0.01029944 0.01025152
 0.01029849 0.01025391 0.01028061 0.01024294]

mean value: 0.011790895462036132

key: score_time
value: [0.01097441 0.008816   0.00896859 0.00880456 0.00884557 0.00885034
 0.00882292 0.00883365 0.00881696 0.00878763]

mean value: 0.009052062034606933

key: test_mcc
value: [ 0.5508895   0.56713087  0.38251843 -0.04413674 -0.05454545  0.38204659
 -0.03149704  0.38204659  0.76011695  0.80903983]

mean value: 0.37036095219850856

key: train_mcc
value: [0.58249009 0.5427084  0.62227177 0.55500996 0.60935992 0.62292056
 0.58514128 0.58191044 0.58514128 0.5955795 ]

mean value: 0.5882533192252368

key: test_accuracy
value: [0.94827586 0.96551724 0.94827586 0.9137931  0.89655172 0.94736842
 0.92982456 0.94736842 0.96491228 0.98245614]

mean value: 0.9444343617664852

key: train_accuracy
value: [0.95938104 0.95551257 0.96324952 0.95744681 0.96324952 0.96138996
 0.95752896 0.96138996 0.95752896 0.96138996]

mean value: 0.9598067257641726

key: test_fscore
value: [0.57142857 0.5        0.4        0.         0.         0.4
 0.         0.4        0.75       0.8       ]

mean value: 0.3821428571428572

key: train_fscore
value: [0.60377358 0.56603774 0.64150943 0.57692308 0.62745098 0.64285714
 0.60714286 0.6        0.60714286 0.61538462]

mean value: 0.6088222284559688

key: test_precision
value: [0.5 1.  0.5 0.  0.  0.5 0.  0.5 0.6 1. ]

mean value: 0.46

key: train_precision
value: [0.61538462 0.57692308 0.65384615 0.6        0.66666667 0.62068966
 0.5862069  0.65217391 0.5862069  0.64      ]

mean value: 0.6198097874139853

key: test_recall
value: [0.66666667 0.33333333 0.33333333 0.         0.         0.33333333
 0.         0.33333333 1.         0.66666667]

mean value: 0.36666666666666664

key: train_recall
value: [0.59259259 0.55555556 0.62962963 0.55555556 0.59259259 0.66666667
 0.62962963 0.55555556 0.62962963 0.59259259]

mean value: 0.6

key: test_roc_auc
value: [0.81515152 0.66666667 0.65757576 0.48181818 0.47272727 0.65740741
 0.49074074 0.65740741 0.98148148 0.83333333]

mean value: 0.6714309764309764

key: train_roc_auc
value: [0.78609221 0.76655329 0.80563114 0.7675737  0.78813303 0.8221317
 0.80259486 0.76963114 0.80259486 0.78713133]

mean value: 0.7898067251340455

key: test_jcc
value: [0.4        0.33333333 0.25       0.         0.         0.25
 0.         0.25       0.6        0.66666667]

mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
0.275

key: train_jcc
value: [0.43243243 0.39473684 0.47222222 0.40540541 0.45714286 0.47368421
 0.43589744 0.42857143 0.43589744 0.44444444]

mean value: 0.4380434714645241

MCC on Blind test: 0.2

Accuracy on Blind test: 0.56

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.10296011 0.07609224 0.06126785 0.07784891 0.08386159 0.0641036
 0.06421399 0.07174659 0.06452918 0.06336784]

mean value: 0.07299919128417968

key: score_time
value: [0.01094532 0.01224589 0.01145506 0.01107168 0.01102376 0.01049256
 0.01052618 0.01079702 0.01050138 0.01053238]

mean value: 0.010959124565124512

key: test_mcc
value: [ 0.64848485  0.80917359  0.56713087  0.         -0.03093441  0.56694671
  0.80903983  0.64814815  1.          0.64814815]

mean value: 0.5666137744534676

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96551724 0.98275862 0.96551724 0.94827586 0.93103448 0.96491228
 0.98245614 0.96491228 1.         0.96491228]

mean value: 0.9670296430732003

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.8        0.5        0.         0.         0.5
 0.8        0.66666667 1.         0.66666667]

mean value: 0.56

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 1.         1.         0.         0.         1.
 1.         0.66666667 1.         0.66666667]

mean value: 0.7

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.         0.         0.33333333
 0.66666667 0.66666667 1.         0.66666667]

mean value: 0.5

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.82424242 0.83333333 0.66666667 0.5        0.49090909 0.66666667
 0.83333333 0.82407407 1.         0.82407407]

mean value: 0.7463299663299663

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.66666667 0.33333333 0.         0.         0.33333333
 0.66666667 0.5        1.         0.5       ]

mean value: 0.45

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.26

Accuracy on Blind test: 0.58

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.0533421  0.05394554 0.07271242 0.05558991 0.0737896  0.08503461
 0.06888843 0.09786415 0.09248352 0.04690504]

mean value: 0.07005553245544434

key: score_time
value: [0.01226044 0.01864839 0.01218653 0.01870155 0.01248217 0.0122149
 0.01872897 0.03266191 0.02536178 0.0123961 ]

mean value: 0.017564272880554198

key: test_mcc
value: [0.76038268 0.32993527 0.24366266 0.38251843 0.38251843 0.56694671
 0.39056329 0.35714286 0.56694671 0.39056329]

mean value: 0.4371180313243308

key: train_mcc
value: [0.84368859 0.8348779  0.90074358 0.8609619  0.87923612 0.83872426
 0.86097638 0.87924838 0.86097638 0.83872426]

mean value: 0.859815774084354

key: test_accuracy
value: [0.96551724 0.86206897 0.9137931  0.94827586 0.94827586 0.96491228
 0.89473684 0.87719298 0.96491228 0.89473684]

mean value: 0.9234422262552934

key: train_accuracy
value: [0.98452611 0.98452611 0.99032882 0.98646035 0.98839458 0.98455598
 0.98648649 0.98841699 0.98648649 0.98455598]

mean value: 0.9864737907291099

key: test_fscore
value: [0.75       0.33333333 0.28571429 0.4        0.4        0.5
 0.4        0.36363636 0.5        0.4       ]

mean value: 0.43326839826839825

key: train_fscore
value: [0.85185185 0.84       0.90566038 0.86792453 0.88461538 0.84615385
 0.86792453 0.88461538 0.86792453 0.84615385]

mean value: 0.8662824275654464

key: test_precision
value: [0.6        0.22222222 0.25       0.5        0.5        1.
 0.28571429 0.25       1.         0.28571429]

mean value: 0.48936507936507934

key: train_precision
value: [0.85185185 0.91304348 0.92307692 0.88461538 0.92       0.88
 0.88461538 0.92       0.88461538 0.88      ]

mean value: 0.8941818407035799

key: test_recall
value: [1.         0.66666667 0.33333333 0.33333333 0.33333333 0.33333333
 0.66666667 0.66666667 0.33333333 0.66666667]

mean value: 0.5333333333333333

key: train_recall
value: [0.85185185 0.77777778 0.88888889 0.85185185 0.85185185 0.81481481
 0.85185185 0.85185185 0.85185185 0.81481481]

mean value: 0.8407407407407408

key: test_roc_auc
value: [0.98181818 0.76969697 0.63939394 0.65757576 0.65757576 0.66666667
 0.78703704 0.77777778 0.66666667 0.78703704]

mean value: 0.7391245791245791

key: train_roc_auc
value: [0.92184429 0.88684807 0.94240363 0.9228647  0.92388511 0.90435242
 0.92287094 0.92388927 0.92287094 0.90435242]

mean value: 0.9176181778436652

key: test_jcc
value: [0.6        0.2        0.16666667 0.25       0.25       0.33333333
 0.25       0.22222222 0.33333333 0.25      ]

mean value: 0.28555555555555556

key: train_jcc
value: [0.74193548 0.72413793 0.82758621 0.76666667 0.79310345 0.73333333
 0.76666667 0.79310345 0.76666667 0.73333333]

mean value: 0.7646533185020393

MCC on Blind test: 0.24

Accuracy on Blind test: 0.58

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01456141 0.01192737 0.01130104 0.01136208 0.01116967 0.01007843
 0.0100646  0.01123548 0.00998092 0.01085925]

mean value: 0.011254024505615235

key: score_time
value: [0.01082349 0.00998855 0.00960755 0.00958705 0.00879359 0.00871301
 0.0093348  0.00888252 0.00888968 0.00979233]

mean value: 0.009441256523132324

key: test_mcc
value: [ 0.85811633  0.48301038  0.38251843 -0.03093441  0.20563808  0.64814815
  0.24282147  0.2962963   0.39056329  0.55039532]

mean value: 0.40265733298043693

key: train_mcc
value: [0.472191   0.46384474 0.49044321 0.47807824 0.55306158 0.49897952
 0.48527524 0.51651884 0.42130112 0.48236557]

mean value: 0.4862059055721764

key: test_accuracy
value: [0.98275862 0.93103448 0.94827586 0.93103448 0.89655172 0.96491228
 0.9122807  0.92982456 0.89473684 0.94736842]

mean value: 0.9338777979431336

key: train_accuracy
value: [0.94197292 0.94003868 0.94197292 0.93423598 0.9516441  0.94401544
 0.93629344 0.94401544 0.93436293 0.94015444]

mean value: 0.9408706302323324

key: test_fscore
value: [0.85714286 0.5        0.4        0.         0.25       0.66666667
 0.28571429 0.33333333 0.4        0.57142857]

mean value: 0.42642857142857143

key: train_fscore
value: [0.5        0.49180328 0.51612903 0.5        0.57627119 0.52459016
 0.50746269 0.53968254 0.4516129  0.50793651]

mean value: 0.5115488298733711

key: test_precision
value: [0.75       0.4        0.5        0.         0.2        0.66666667
 0.25       0.33333333 0.28571429 0.5       ]

mean value: 0.38857142857142857

key: train_precision
value: [0.45454545 0.44117647 0.45714286 0.41463415 0.53125    0.47058824
 0.425      0.47222222 0.4        0.44444444]

mean value: 0.4511003830578795

key: test_recall
value: [1.         0.66666667 0.33333333 0.         0.33333333 0.66666667
 0.33333333 0.33333333 0.66666667 0.66666667]

mean value: 0.5

key: train_recall
value: [0.55555556 0.55555556 0.59259259 0.62962963 0.62962963 0.59259259
 0.62962963 0.62962963 0.51851852 0.59259259]

mean value: 0.5925925925925926

key: test_roc_auc
value: [0.99090909 0.80606061 0.65757576 0.49090909 0.63030303 0.82407407
 0.63888889 0.64814815 0.78703704 0.81481481]

mean value: 0.7288720538720539

key: train_roc_auc
value: [0.75941043 0.75839002 0.77690854 0.79032502 0.79950869 0.77796636
 0.79139323 0.79546655 0.73787433 0.7759297 ]

mean value: 0.7763172863623838

key: test_jcc
value: [0.75       0.33333333 0.25       0.         0.14285714 0.5
 0.16666667 0.2        0.25       0.4       ]

mean value: 0.29928571428571427

key: train_jcc
value: [0.33333333 0.32608696 0.34782609 0.33333333 0.4047619  0.35555556
 0.34       0.36956522 0.29166667 0.34042553]

mean value: 0.34425545864352525

MCC on Blind test: 0.27

Accuracy on Blind test: 0.59

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01444721 0.01985002 0.02024388 0.02129292 0.02039051 0.01963663
 0.01987505 0.01990676 0.02227616 0.02153397]

mean value: 0.019945311546325683

key: score_time
value: [0.01014018 0.01180124 0.01179695 0.01186323 0.01182628 0.01188421
 0.01184916 0.01185966 0.01190591 0.01182818]

mean value: 0.011675500869750976

key: test_mcc
value: [0.76038268 0.56713087 0.38251843 0.43192347 0.5508895  0.56694671
 0.55039532 0.38204659 0.56694671 0.48238191]

mean value: 0.5241562187130722

key: train_mcc
value: [0.87704592 0.76167305 0.98030696 0.56113172 0.89861406 0.7364114
 0.83254034 0.87924838 0.87705763 0.8711254 ]

mean value: 0.8275154861185016

key: test_accuracy
value: [0.96551724 0.96551724 0.94827586 0.9137931  0.94827586 0.96491228
 0.94736842 0.94736842 0.96491228 0.92982456]

mean value: 0.9495765275257109

key: train_accuracy
value: [0.98839458 0.9787234  0.99806576 0.9032882  0.99032882 0.97683398
 0.98455598 0.98841699 0.98841699 0.98455598]

mean value: 0.9781580696474313

key: test_fscore
value: [0.75       0.5        0.4        0.44444444 0.57142857 0.5
 0.57142857 0.4        0.5        0.5       ]

mean value: 0.5137301587301587

key: train_fscore
value: [0.88       0.75555556 0.98113208 0.51923077 0.90196078 0.71428571
 0.83333333 0.88461538 0.88       0.87096774]

mean value: 0.8221081358741665

key: test_precision
value: [0.6        1.         0.5        0.33333333 0.5        1.
 0.5        0.5        1.         0.4       ]

mean value: 0.6333333333333333

key: train_precision
value: [0.95652174 0.94444444 1.         0.35064935 0.95833333 1.
 0.95238095 0.92       0.95652174 0.77142857]

mean value: 0.8810280130497522

key: test_recall
value: [1.         0.33333333 0.33333333 0.66666667 0.66666667 0.33333333
 0.66666667 0.33333333 0.33333333 0.66666667]

mean value: 0.5333333333333333

key: train_recall
value: [0.81481481 0.62962963 0.96296296 1.         0.85185185 0.55555556
 0.74074074 0.85185185 0.81481481 1.        ]

mean value: 0.8222222222222222

key: test_roc_auc
value: [0.98181818 0.66666667 0.65757576 0.7969697  0.81515152 0.66666667
 0.81481481 0.65740741 0.66666667 0.80555556]

mean value: 0.7529292929292929

key: train_roc_auc
value: [0.906387   0.81379441 0.98148148 0.94897959 0.92490552 0.77777778
 0.86935204 0.92388927 0.90638908 0.99185336]

mean value: 0.9044809519191248

key: test_jcc
value: [0.6        0.33333333 0.25       0.28571429 0.4        0.33333333
 0.4        0.25       0.33333333 0.33333333]

mean value: 0.3519047619047619

key: train_jcc
value: [0.78571429 0.60714286 0.96296296 0.35064935 0.82142857 0.55555556
 0.71428571 0.79310345 0.78571429 0.77142857]

mean value: 0.7147985603158017

MCC on Blind test: 0.3

Accuracy on Blind test: 0.6

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01839423 0.01934814 0.01817775 0.01901317 0.01911378 0.01753163
 0.01846671 0.0182879  0.01776218 0.01677585]

mean value: 0.018287134170532227

key: score_time
value: [0.01187444 0.01188636 0.01183009 0.011832   0.01181626 0.01186419
 0.01186252 0.01177549 0.01186895 0.01181245]

mean value: 0.01184227466583252

key: test_mcc
value: [0.76038268 0.45726459 0.56713087 0.38251843 0.2969697  0.56694671
 0.55039532 0.64814815 0.76011695 0.24638016]

mean value: 0.5236253563585851

key: train_mcc
value: [0.88276644 0.4252731  0.73639347 0.82784768 0.92184429 0.85459272
 0.80911474 0.78890378 0.65218543 0.49165645]

mean value: 0.7390578100781762

key: test_accuracy
value: [0.96551724 0.84482759 0.96551724 0.94827586 0.93103448 0.96491228
 0.94736842 0.96491228 0.96491228 0.78947368]

mean value: 0.9286751361161525

key: train_accuracy
value: [0.98839458 0.83172147 0.97678917 0.98065764 0.99226306 0.98648649
 0.98262548 0.97297297 0.94208494 0.86679537]

mean value: 0.9520791169727341

key: test_fscore
value: [0.75       0.4        0.5        0.4        0.33333333 0.5
 0.57142857 0.66666667 0.75       0.25      ]

mean value: 0.5121428571428571

key: train_fscore
value: [0.88888889 0.37410072 0.71428571 0.83333333 0.92592593 0.85106383
 0.8        0.78787879 0.63414634 0.43902439]

mean value: 0.7248647931231662

key: test_precision
value: [0.6        0.25       1.         0.5        0.33333333 1.
 0.5        0.66666667 0.6        0.15384615]

mean value: 0.5603846153846154

key: train_precision
value: [0.88888889 0.23214286 1.         0.75757576 0.92592593 1.
 1.         0.66666667 0.47272727 0.28125   ]

mean value: 0.7225177368927369

key: test_recall
value: [1.         1.         0.33333333 0.33333333 0.33333333 0.33333333
 0.66666667 0.66666667 1.         0.66666667]

mean value: 0.6333333333333333

key: train_recall
value: [0.88888889 0.96296296 0.55555556 0.92592593 0.92592593 0.74074074
 0.66666667 0.96296296 0.96296296 1.        ]

mean value: 0.8592592592592593

key: test_roc_auc
value: [0.98181818 0.91818182 0.66666667 0.65757576 0.64848485 0.66666667
 0.81481481 0.82407407 0.98148148 0.73148148]

mean value: 0.7891245791245791

key: train_roc_auc
value: [0.94138322 0.89372638 0.77777778 0.9547997  0.96092215 0.87037037
 0.83333333 0.96824319 0.95194991 0.92973523]

mean value: 0.9082241264915109

key: test_jcc
value: [0.6        0.25       0.33333333 0.25       0.2        0.33333333
 0.4        0.5        0.6        0.14285714]

mean value: 0.36095238095238097

key: train_jcc
value: [0.8        0.2300885  0.55555556 0.71428571 0.86206897 0.74074074
 0.66666667 0.65       0.46428571 0.28125   ]

mean value: 0.5964941852626854

MCC on Blind test: 0.33

Accuracy on Blind test: 0.61

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.20579267 0.20070958 0.19566107 0.1949234  0.19549155 0.1920774
 0.19445324 0.19372702 0.19367528 0.19807458]

mean value: 0.1964585781097412

key: score_time
value: [0.01597142 0.01675367 0.01633692 0.0162847  0.01555514 0.01612735
 0.01671433 0.01590085 0.01547456 0.01589966]

mean value: 0.016101861000061037

key: test_mcc
value: [0.43192347 0.76038268 0.56713087 0.         0.56713087 0.
 0.38204659 0.55039532 0.80903983 0.2962963 ]

mean value: 0.4364345939424672

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9137931  0.96551724 0.96551724 0.94827586 0.96551724 0.94736842
 0.94736842 0.94736842 0.98245614 0.92982456]

mean value: 0.9513006654567453

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.44444444 0.75       0.5        0.         0.5        0.
 0.4        0.57142857 0.8        0.33333333]

mean value: 0.4299206349206349

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.33333333 0.6        1.         0.         1.         0.
 0.5        0.5        1.         0.33333333]

mean value: 0.5266666666666666

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 1.         0.33333333 0.         0.33333333 0.
 0.33333333 0.66666667 0.66666667 0.33333333]

mean value: 0.4333333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.7969697  0.98181818 0.66666667 0.5        0.66666667 0.5
 0.65740741 0.81481481 0.83333333 0.64814815]

mean value: 0.7065824915824915

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.28571429 0.6        0.33333333 0.         0.33333333 0.
 0.25       0.4        0.66666667 0.2       ]

mean value: 0.3069047619047619

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.26

Accuracy on Blind test: 0.58

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.05620265 0.06906557 0.07550597 0.0792644  0.08082414 0.07877135
 0.06620526 0.07880783 0.06887841 0.0823257 ]

mean value: 0.07358512878417969

key: score_time
value: [0.02078819 0.02542567 0.02489471 0.03681445 0.02469873 0.0268507
 0.02675176 0.02348495 0.02814078 0.03065825]

mean value: 0.02685081958770752

key: test_mcc
value: [ 0.64848485  0.80917359  0.56713087  0.56713087 -0.03093441  0.80903983
  0.38204659  0.80903983  0.56694671  0.38204659]

mean value: 0.5510105333468213

key: train_mcc
value: [0.93993608 0.93993608 0.98030696 0.98030696 0.89861406 0.98030899
 0.96029664 0.96029664 0.94053145 0.98030899]

mean value: 0.9560842846865666

key: test_accuracy
value: [0.96551724 0.98275862 0.96551724 0.96551724 0.93103448 0.98245614
 0.94736842 0.98245614 0.96491228 0.94736842]

mean value: 0.9634906231094978

key: train_accuracy
value: [0.99419729 0.99419729 0.99806576 0.99806576 0.99032882 0.9980695
 0.996139   0.996139   0.99420849 0.9980695 ]

mean value: 0.9957480414927223

key: test_fscore
value: [0.66666667 0.8        0.5        0.5        0.         0.8
 0.4        0.8        0.5        0.4       ]

mean value: 0.5366666666666666

key: train_fscore
value: [0.94117647 0.94117647 0.98113208 0.98113208 0.90196078 0.98113208
 0.96153846 0.96153846 0.94339623 0.98113208]

mean value: 0.9575315176869006

key: test_precision
value: [0.66666667 1.         1.         1.         0.         1.
 0.5        1.         1.         0.5       ]

mean value: 0.7666666666666666

key: train_precision
value: [1.         1.         1.         1.         0.95833333 1.
 1.         1.         0.96153846 1.        ]

mean value: 0.9919871794871795

key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.33333333 0.         0.66666667
 0.33333333 0.66666667 0.33333333 0.33333333]

mean value: 0.4333333333333333

key: train_recall
value: [0.88888889 0.88888889 0.96296296 0.96296296 0.85185185 0.96296296
 0.92592593 0.92592593 0.92592593 0.96296296]

mean value: 0.9259259259259259

key: test_roc_auc
value: [0.82424242 0.83333333 0.66666667 0.66666667 0.49090909 0.83333333
 0.65740741 0.83333333 0.66666667 0.65740741]

mean value: 0.712996632996633

key: train_roc_auc
value: [0.94444444 0.94444444 0.98148148 0.98148148 0.92490552 0.98148148
 0.96296296 0.96296296 0.96194463 0.98148148]

mean value: 0.9627590891527464

key: test_jcc
value: [0.5        0.66666667 0.33333333 0.33333333 0.         0.66666667
 0.25       0.66666667 0.33333333 0.25      ]

mean value: 0.39999999999999997

key: train_jcc
value: [0.88888889 0.88888889 0.96296296 0.96296296 0.82142857 0.96296296
 0.92592593 0.92592593 0.89285714 0.96296296]

mean value: 0.9195767195767195

MCC on Blind test: 0.22

Accuracy on Blind test: 0.56

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.1794157  0.19253588 0.17404079 0.15018892 0.12815928 0.18164134
 0.12826395 0.17812467 0.17865038 0.18234944]

mean value: 0.1673370361328125

key: score_time
value: [0.02564001 0.02518463 0.01993012 0.0287869  0.02011561 0.01578426
 0.02864242 0.02854109 0.02530122 0.02950859]

mean value: 0.024743485450744628

key: test_mcc
value: [0.56713087 0.56713087 0.         0.         0.         0.
 0.         0.         0.56694671 0.        ]

mean value: 0.17012084551450418

key: train_mcc
value: [0.93993608 0.91921394 0.96029266 0.91921394 0.93993608 0.9399419
 0.91922152 0.91922152 0.91922152 0.91922152]

mean value: 0.9295420672150851

key: test_accuracy
value: [0.96551724 0.96551724 0.94827586 0.94827586 0.94827586 0.94736842
 0.94736842 0.94736842 0.96491228 0.94736842]

mean value: 0.9530248033877797

key: train_accuracy
value: [0.99419729 0.99226306 0.99613153 0.99226306 0.99419729 0.99420849
 0.99227799 0.99227799 0.99227799 0.99227799]

mean value: 0.9932372687691837

key: test_fscore
value: [0.5 0.5 0.  0.  0.  0.  0.  0.  0.5 0. ]

mean value: 0.15

key: train_fscore
value: [0.94117647 0.92       0.96153846 0.92       0.94117647 0.94117647
 0.92       0.92       0.92       0.92      ]

mean value: 0.9305067873303168

key: test_precision
value: [1. 1. 0. 0. 0. 0. 0. 0. 1. 0.]

mean value: 0.3

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.33333333 0.33333333 0.         0.         0.         0.
 0.         0.         0.33333333 0.        ]

mean value: 0.09999999999999999

key: train_recall
value: [0.88888889 0.85185185 0.92592593 0.85185185 0.88888889 0.88888889
 0.85185185 0.85185185 0.85185185 0.85185185]

mean value: 0.8703703703703703

key: test_roc_auc
value: [0.66666667 0.66666667 0.5        0.5        0.5        0.5
 0.5        0.5        0.66666667 0.5       ]

mean value: 0.55

key: train_roc_auc
value: [0.94444444 0.92592593 0.96296296 0.92592593 0.94444444 0.94444444
 0.92592593 0.92592593 0.92592593 0.92592593]

mean value: 0.9351851851851852

key: test_jcc
value: [0.33333333 0.33333333 0.         0.         0.         0.
 0.         0.         0.33333333 0.        ]

mean value: 0.09999999999999999

key: train_jcc
value: [0.88888889 0.85185185 0.92592593 0.85185185 0.88888889 0.88888889
 0.85185185 0.85185185 0.85185185 0.85185185]

mean value: 0.8703703703703703

MCC on Blind test: 0.09

Accuracy on Blind test: 0.52

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.78224468 0.77179456 0.7493031  0.78262949 0.78038216 0.78873324
 0.78629851 0.78719425 0.77175832 0.78670549]

mean value: 0.7787043809890747

key: score_time
value: [0.00961399 0.00944686 0.01149893 0.0093987  0.01046252 0.00997686
 0.0101912  0.00940299 0.01038051 0.01013303]

mean value: 0.010050559043884277

key: test_mcc
value: [0.64848485 0.64848485 0.56713087 0.56713087 0.56713087 0.64814815
 0.64814815 0.64814815 1.         0.55039532]

mean value: 0.6493202081863393

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96551724 0.96551724 0.96551724 0.96551724 0.96551724 0.96491228
 0.96491228 0.96491228 1.         0.94736842]

mean value: 0.9669691470054447

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.66666667 0.66666667 0.5        0.5        0.5        0.66666667
 0.66666667 0.66666667 1.         0.57142857]

mean value: 0.6404761904761904

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.66666667 0.66666667 1.         1.         1.         0.66666667
 0.66666667 0.66666667 1.         0.5       ]

mean value: 0.7833333333333333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.33333333 0.33333333 0.66666667
 0.66666667 0.66666667 1.         0.66666667]

mean value: 0.6

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.82424242 0.82424242 0.66666667 0.66666667 0.66666667 0.82407407
 0.82407407 0.82407407 1.         0.81481481]

mean value: 0.7935521885521886

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.5        0.5        0.33333333 0.33333333 0.33333333 0.5
 0.5        0.5        1.         0.4       ]

mean value: 0.49

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.27

Accuracy on Blind test: 0.58

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.0282979  0.02876115 0.02958274 0.02728486 0.02812648 0.02823567
 0.02803469 0.02771449 0.02794123 0.02837539]

mean value: 0.028235459327697755

key: score_time
value: [0.0126543  0.01246667 0.01491022 0.01432562 0.01413298 0.01427579
 0.01446295 0.01423359 0.01441836 0.0146451 ]

mean value: 0.014052557945251464

key: test_mcc
value: [ 0.68755165  0.1762953  -0.05454545 -0.04413674 -0.05454545  0.24282147
 -0.05555556  0.1134023   0.17516462  0.2962963 ]

mean value: 0.14827484228206453

key: train_mcc
value: [0.37617287 0.4990914  0.53407501 0.4990914  0.4990914  0.46163583
 0.53409532 0.56704983 0.46163583 0.46163583]

mean value: 0.4893574733872949

key: test_accuracy
value: [0.94827586 0.87931034 0.89655172 0.9137931  0.89655172 0.9122807
 0.89473684 0.8245614  0.87719298 0.92982456]

mean value: 0.897307924984876

key: train_accuracy
value: [0.95551257 0.96131528 0.96324952 0.96131528 0.96131528 0.95945946
 0.96332046 0.96525097 0.95945946 0.95945946]

mean value: 0.9609657737317312

key: test_fscore
value: [0.66666667 0.22222222 0.         0.         0.         0.28571429
 0.         0.16666667 0.22222222 0.33333333]

mean value: 0.18968253968253967

key: train_fscore
value: [0.25806452 0.41176471 0.45714286 0.41176471 0.41176471 0.36363636
 0.45714286 0.5        0.36363636 0.36363636]

mean value: 0.3998553438970896

key: test_precision
value: [0.5        0.16666667 0.         0.         0.         0.25
 0.         0.11111111 0.16666667 0.33333333]

mean value: 0.15277777777777776

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.33333333 0.         0.         0.         0.33333333
 0.         0.33333333 0.33333333 0.33333333]

mean value: 0.26666666666666666

key: train_recall
value: [0.14814815 0.25925926 0.2962963  0.25925926 0.25925926 0.22222222
 0.2962963  0.33333333 0.22222222 0.22222222]

mean value: 0.2518518518518518

key: test_roc_auc
value: [0.97272727 0.62121212 0.47272727 0.48181818 0.47272727 0.63888889
 0.47222222 0.59259259 0.62037037 0.64814815]

mean value: 0.5993434343434343

key: train_roc_auc
value: [0.57407407 0.62962963 0.64814815 0.62962963 0.62962963 0.61111111
 0.64814815 0.66666667 0.61111111 0.61111111]

mean value: 0.625925925925926

key: test_jcc
value: [0.5        0.125      0.         0.         0.         0.16666667
 0.         0.09090909 0.125      0.2       ]

mean value: 0.12075757575757576

key: train_jcc
value: [0.14814815 0.25925926 0.2962963  0.25925926 0.25925926 0.22222222
 0.2962963  0.33333333 0.22222222 0.22222222]

mean value: 0.2518518518518518

MCC on Blind test: -0.05

Accuracy on Blind test: 0.5

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02868271 0.03869915 0.03853083 0.03858232 0.03899741 0.03781533
 0.04338479 0.03836989 0.03853822 0.03898597]

mean value: 0.03805866241455078

key: score_time
value: [0.01904464 0.0187161  0.01863718 0.01865602 0.0190537  0.01886559
 0.01899195 0.01890612 0.01882076 0.01871467]

mean value: 0.018840670585632324

key: test_mcc
value: [1.         0.80917359 0.56713087 0.56713087 0.38251843 0.80903983
 0.2962963  0.64814815 0.56694671 0.64814815]

mean value: 0.6294532902524936

key: train_mcc
value: [0.73676407 0.78589709 0.8325254  0.8321053  0.76167305 0.76169209
 0.76169209 0.76169209 0.76131958 0.80951339]

mean value: 0.7804874132160566

key: test_accuracy
value: [1.         0.98275862 0.96551724 0.96551724 0.94827586 0.98245614
 0.92982456 0.96491228 0.96491228 0.96491228]

mean value: 0.9669086509376891

key: train_accuracy
value: [0.97678917 0.98065764 0.98452611 0.98452611 0.9787234  0.97876448
 0.97876448 0.97876448 0.97876448 0.98262548]

mean value: 0.9802905834820729

key: test_fscore
value: [1.         0.8        0.5        0.5        0.4        0.8
 0.33333333 0.66666667 0.5        0.66666667]

mean value: 0.6166666666666667

key: train_fscore
value: [0.72727273 0.7826087  0.83333333 0.82608696 0.75555556 0.75555556
 0.75555556 0.75555556 0.74418605 0.80851064]

mean value: 0.7744220619811697

key: test_precision
value: [1.         1.         1.         1.         0.5        1.
 0.33333333 0.66666667 1.         0.66666667]

mean value: 0.8166666666666667

key: train_precision
value: [0.94117647 0.94736842 0.95238095 1.         0.94444444 0.94444444
 0.94444444 0.94444444 1.         0.95      ]

mean value: 0.9568703621799597

key: test_recall
value: [1.         0.66666667 0.33333333 0.33333333 0.33333333 0.66666667
 0.33333333 0.66666667 0.33333333 0.66666667]

mean value: 0.5333333333333333

key: train_recall
value: [0.59259259 0.66666667 0.74074074 0.7037037  0.62962963 0.62962963
 0.62962963 0.62962963 0.59259259 0.7037037 ]

mean value: 0.6518518518518519

key: test_roc_auc
value: [1.         0.83333333 0.66666667 0.66666667 0.65757576 0.83333333
 0.64814815 0.82407407 0.66666667 0.82407407]

mean value: 0.762053872053872

key: train_roc_auc
value: [0.79527589 0.83231293 0.86934996 0.85185185 0.81379441 0.81379648
 0.81379648 0.81379648 0.7962963  0.85083352]

mean value: 0.8251104306850597

key: test_jcc
value: [1.         0.66666667 0.33333333 0.33333333 0.25       0.66666667
 0.2        0.5        0.33333333 0.5       ]

mean value: 0.47833333333333333

key: train_jcc
value: [0.57142857 0.64285714 0.71428571 0.7037037  0.60714286 0.60714286
 0.60714286 0.60714286 0.59259259 0.67857143]

mean value: 0.6332010582010582

MCC on Blind test: 0.25

Accuracy on Blind test: 0.57

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.21960998 0.27916813 0.2967515  0.17903543 0.30591917 0.16244292
 0.22275519 0.29180765 0.29906225 0.32112122]

mean value: 0.2577673435211182

key: score_time
value: [0.01901317 0.01612353 0.0195241  0.01285934 0.0123415  0.01899672
 0.02589607 0.02119637 0.01926637 0.02765989]

mean value: 0.019287705421447754

key: test_mcc
value: [0.80917359 0.80917359 0.56713087 0.         0.38251843 0.80903983
 0.2962963  0.64814815 0.56694671 0.64814815]

mean value: 0.5536575623422023

key: train_mcc
value: [0.68418345 0.65669104 0.8325254  0.71071615 0.76167305 0.76169209
 0.76169209 0.76169209 0.76131958 0.65717642]

mean value: 0.7349361348491141

key: test_accuracy
value: [0.98275862 0.98275862 0.96551724 0.94827586 0.94827586 0.98245614
 0.92982456 0.96491228 0.96491228 0.96491228]

mean value: 0.9634603750756201

key: train_accuracy
value: [0.9729207  0.97098646 0.98452611 0.97485493 0.9787234  0.97876448
 0.97876448 0.97876448 0.97876448 0.97104247]

mean value: 0.9768111991516247

key: test_fscore
value: [0.8        0.8        0.5        0.         0.4        0.8
 0.33333333 0.66666667 0.5        0.66666667]

mean value: 0.5466666666666666

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:114: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:117: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.65       0.61538462 0.83333333 0.68292683 0.75555556 0.75555556
 0.75555556 0.75555556 0.74418605 0.63414634]

mean value: 0.7182199388183507

key: test_precision
value: [1.         1.         1.         0.         0.5        1.
 0.33333333 0.66666667 1.         0.66666667]

mean value: 0.7166666666666667

key: train_precision
value: [1.         1.         0.95238095 1.         0.94444444 0.94444444
 0.94444444 0.94444444 1.         0.92857143]

mean value: 0.9658730158730159

key: test_recall
value: [0.66666667 0.66666667 0.33333333 0.         0.33333333 0.66666667
 0.33333333 0.66666667 0.33333333 0.66666667]

mean value: 0.4666666666666666

key: train_recall
value: [0.48148148 0.44444444 0.74074074 0.51851852 0.62962963 0.62962963
 0.62962963 0.62962963 0.59259259 0.48148148]

mean value: 0.5777777777777777

key: test_roc_auc
value: [0.83333333 0.83333333 0.66666667 0.5        0.65757576 0.83333333
 0.64814815 0.82407407 0.66666667 0.82407407]

mean value: 0.7287205387205387

key: train_roc_auc
value: [0.74074074 0.72222222 0.86934996 0.75925926 0.81379441 0.81379648
 0.81379648 0.81379648 0.7962963  0.73972241]

mean value: 0.7882774752806758

key: test_jcc
value: [0.66666667 0.66666667 0.33333333 0.         0.25       0.66666667
 0.2        0.5        0.33333333 0.5       ]

mean value: 0.4116666666666666

key: train_jcc
value: [0.48148148 0.44444444 0.71428571 0.51851852 0.60714286 0.60714286
 0.60714286 0.60714286 0.59259259 0.46428571]

mean value: 0.5644179894179894

MCC on Blind test: 0.21

Accuracy on Blind test: 0.55

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.04487562 0.04138589 0.04369426 0.04303861 0.04680061 0.04244494
 0.03946328 0.04193759 0.04206204 0.03964663]

mean value: 0.04253494739532471

key: score_time
value: [0.01250577 0.01236224 0.01433182 0.01480269 0.02338958 0.01450467
 0.01239443 0.01240659 0.01476336 0.0124073 ]

mean value: 0.014386844635009766

key: test_mcc
value: [0.88989899 0.92724828 0.98181818 0.94511785 0.96396098 0.96393713
 0.94635821 0.98181211 0.96396098 0.92905936]

mean value: 0.9493172082658792

key: train_mcc
value: [0.97573744 0.96750835 0.96550335 0.96142511 0.96543927 0.96343147
 0.97158737 0.96750942 0.96550464 0.97158737]

mean value: 0.9675233803087637

key: test_accuracy
value: [0.94495413 0.96330275 0.99082569 0.97247706 0.98165138 0.98165138
 0.97247706 0.99082569 0.98165138 0.96330275]

mean value: 0.9743119266055046

key: train_accuracy
value: [0.98776758 0.98369011 0.98267074 0.98063201 0.98267074 0.98165138
 0.98572885 0.98369011 0.98267074 0.98572885]

mean value: 0.9836901121304791

key: test_fscore
value: [0.94444444 0.96363636 0.99082569 0.97247706 0.98181818 0.98214286
 0.97345133 0.99099099 0.98148148 0.96491228]

mean value: 0.974618067994328

key: train_fscore
value: [0.98790323 0.98383838 0.98284561 0.98082745 0.98281092 0.98178138
 0.98582996 0.98380567 0.98281092 0.98582996]

mean value: 0.9838283470967917

key: test_precision
value: [0.94444444 0.94642857 0.98181818 0.96363636 0.96428571 0.96491228
 0.94827586 0.98214286 1.         0.93220339]

mean value: 0.962814766535736

key: train_precision
value: [0.97804391 0.9759519  0.974      0.972      0.97590361 0.97389558
 0.97791165 0.97590361 0.9739479  0.97791165]

mean value: 0.9755469816192518

key: test_recall
value: [0.94444444 0.98148148 1.         0.98148148 1.         1.
 1.         1.         0.96363636 1.        ]

mean value: 0.9871043771043772

key: train_recall
value: [0.99796334 0.99185336 0.99185336 0.9898167  0.9898167  0.98979592
 0.99387755 0.99183673 0.99183673 0.99387755]

mean value: 0.992252795211771

key: test_roc_auc
value: [0.94494949 0.96346801 0.99090909 0.97255892 0.98181818 0.98148148
 0.97222222 0.99074074 0.98181818 0.96296296]

mean value: 0.9742929292929293

key: train_roc_auc
value: [0.98775718 0.98368178 0.98266137 0.98062264 0.98266345 0.98165967
 0.98573715 0.98369841 0.98268008 0.98573715]

mean value: 0.9836898873602394

key: test_jcc
value: [0.89473684 0.92982456 0.98181818 0.94642857 0.96428571 0.96491228
 0.94827586 0.98214286 0.96363636 0.93220339]

mean value: 0.9508264624421688

key: train_jcc
value: [0.97609562 0.96819085 0.96626984 0.96237624 0.96620278 0.96421471
 0.97205589 0.96812749 0.96620278 0.97205589]

mean value: 0.9681792096111226

MCC on Blind test: 0.3

Accuracy on Blind test: 0.6

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.99837112 1.1198895  1.16606784 1.38026643 1.1718266  1.01276398
 1.08470702 1.11438894 1.01340914 1.17820048]

mean value: 1.1239891052246094

key: score_time
value: [0.01469374 0.01463389 0.02088499 0.01533866 0.01591563 0.01322365
 0.01592922 0.01325226 0.02350879 0.01339555]

mean value: 0.016077637672424316

key: test_mcc
value: [0.90841751 0.92724828 0.98181818 0.96329966 0.96396098 0.96393713
 0.92905936 0.96329966 0.94511785 0.92905936]

mean value: 0.947521797154391

key: train_mcc
value: [0.9857799  0.9837635  0.97767312 0.99796333 0.9857799  0.95334985
 0.9817516  0.94927144 0.9534452  0.9817516 ]

mean value: 0.9750529452374251

key: test_accuracy
value: [0.95412844 0.96330275 0.99082569 0.98165138 0.98165138 0.98165138
 0.96330275 0.98165138 0.97247706 0.96330275]

mean value: 0.9733944954128441

key: train_accuracy
value: [0.99286442 0.99184506 0.98878695 0.99898063 0.99286442 0.97655454
 0.99082569 0.9745158  0.97655454 0.99082569]

mean value: 0.9874617737003059

key: test_fscore
value: [0.95412844 0.96363636 0.99082569 0.98148148 0.98181818 0.98214286
 0.96491228 0.98181818 0.97247706 0.96491228]

mean value: 0.9738152819961126

key: train_fscore
value: [0.9929078  0.99190283 0.98887765 0.99898271 0.9929078  0.97679112
 0.99088146 0.97477296 0.97683787 0.99088146]

mean value: 0.9875743656721899

key: test_precision
value: [0.94545455 0.94642857 0.98181818 0.98148148 0.96428571 0.96491228
 0.93220339 0.98181818 0.98148148 0.93220339]

mean value: 0.9612087218130929

key: train_precision
value: [0.98790323 0.98591549 0.98192771 0.99796748 0.98790323 0.96606786
 0.98390342 0.96407186 0.96421471 0.98390342]

mean value: 0.9803778408423602

key: test_recall
value: [0.96296296 0.98148148 1.         0.98148148 1.         1.
 1.         0.98181818 0.96363636 1.        ]

mean value: 0.9871380471380471

key: train_recall
value: [0.99796334 0.99796334 0.99592668 1.         0.99796334 0.9877551
 0.99795918 0.98571429 0.98979592 0.99795918]

mean value: 0.9949000374080386

key: test_roc_auc
value: [0.95420875 0.96346801 0.99090909 0.98164983 0.98181818 0.98148148
 0.96296296 0.98164983 0.97255892 0.96296296]

mean value: 0.9733670033670033

key: train_roc_auc
value: [0.99285922 0.99183881 0.98877967 0.99897959 0.99285922 0.97656594
 0.99083295 0.9745272  0.97656802 0.99083295]

mean value: 0.9874643584521385

key: test_jcc
value: [0.9122807  0.92982456 0.98181818 0.96363636 0.96428571 0.96491228
 0.93220339 0.96428571 0.94642857 0.93220339]

mean value: 0.9491878868975211

key: train_jcc
value: [0.98591549 0.98393574 0.978      0.99796748 0.98591549 0.95463511
 0.98192771 0.9507874  0.95472441 0.98192771]

mean value: 0.9755736549753808

MCC on Blind test: 0.32

Accuracy on Blind test: 0.61

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01865888 0.01219416 0.0124011  0.01228523 0.01231241 0.01269364
 0.01386428 0.0126965  0.01259089 0.01382422]

mean value: 0.013352131843566895

key: score_time
value: [0.07297492 0.00939155 0.00941801 0.00939441 0.00960684 0.01031232
 0.00982857 0.00956607 0.00982213 0.00963259]

mean value: 0.015994739532470704

key: test_mcc
value: [0.85382288 0.85560229 0.84200168 0.85382288 0.80953606 0.87280881
 0.83496131 0.8582788  0.98181818 0.8582788 ]

mean value: 0.8620931700334511

key: train_mcc
value: [0.8661361  0.86728222 0.86560142 0.87136867 0.87160834 0.86946083
 0.87788349 0.87094841 0.8476271  0.86997051]

mean value: 0.8677887071504141

key: test_accuracy
value: [0.9266055  0.9266055  0.91743119 0.9266055  0.89908257 0.93577982
 0.91743119 0.9266055  0.99082569 0.9266055 ]

mean value: 0.9293577981651376

key: train_accuracy
value: [0.93170234 0.93272171 0.93170234 0.93476045 0.93476045 0.93374108
 0.93781855 0.93476045 0.92150866 0.93374108]

mean value: 0.9327217125382263

key: test_fscore
value: [0.92727273 0.92857143 0.92173913 0.92727273 0.90598291 0.9380531
 0.91891892 0.93103448 0.99082569 0.93103448]

mean value: 0.9320705589389259

key: train_fscore
value: [0.93437806 0.93491124 0.93411996 0.93688363 0.93700787 0.93583416
 0.93990148 0.93650794 0.92531523 0.93608653]

mean value: 0.9350946094457768

key: test_precision
value: [0.91071429 0.89655172 0.86885246 0.91071429 0.84126984 0.9137931
 0.91071429 0.8852459  1.         0.8852459 ]

mean value: 0.9023101788293987

key: train_precision
value: [0.9        0.90630975 0.90304183 0.9082218  0.90666667 0.90630975
 0.90857143 0.91119691 0.88170055 0.90322581]

mean value: 0.9035244492701532

key: test_recall
value: [0.94444444 0.96296296 0.98148148 0.94444444 0.98148148 0.96363636
 0.92727273 0.98181818 0.98181818 0.98181818]

mean value: 0.9651178451178452

key: train_recall
value: [0.97148676 0.96537678 0.96741344 0.96741344 0.9694501  0.96734694
 0.97346939 0.96326531 0.97346939 0.97142857]

mean value: 0.9690120121368303

key: test_roc_auc
value: [0.92676768 0.92693603 0.91801347 0.92676768 0.89983165 0.93552189
 0.91734007 0.92609428 0.99090909 0.92609428]

mean value: 0.9294276094276094

key: train_roc_auc
value: [0.93166175 0.93268839 0.9316659  0.93472713 0.93472505 0.9337753
 0.93785486 0.93478948 0.92156158 0.93377946]

mean value: 0.9327228895631572

key: test_jcc
value: [0.86440678 0.86666667 0.85483871 0.86440678 0.828125   0.88333333
 0.85       0.87096774 0.98181818 0.87096774]

mean value: 0.8735530934688602

key: train_jcc
value: [0.87683824 0.87777778 0.87638376 0.8812616  0.88148148 0.87940631
 0.8866171  0.88059701 0.86101083 0.87985213]

mean value: 0.8781226233231253

MCC on Blind test: 0.29

Accuracy on Blind test: 0.61

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01229978 0.01690793 0.0169034  0.01696324 0.01693296 0.01681113
 0.01681638 0.01694918 0.01694226 0.01689982]

mean value: 0.01644260883331299

key: score_time
value: [0.01223588 0.01235938 0.0123477  0.01233745 0.01240754 0.01235318
 0.01235676 0.01237297 0.01232409 0.01234961]

mean value: 0.012344455718994141

key: test_mcc
value: [0.74351235 0.85541029 0.80136396 0.81882759 0.78478168 0.88989899
 0.85319865 0.81649832 0.83501684 0.76161616]

mean value: 0.8160124823784499

key: train_mcc
value: [0.82670934 0.81447691 0.82263186 0.81249922 0.82076536 0.80838391
 0.81246155 0.81448891 0.81246155 0.82059712]

mean value: 0.816547573939551

key: test_accuracy
value: [0.87155963 0.9266055  0.89908257 0.90825688 0.88990826 0.94495413
 0.9266055  0.90825688 0.91743119 0.88073394]

mean value: 0.9073394495412844

key: train_accuracy
value: [0.91335372 0.90723751 0.91131498 0.90621814 0.91029562 0.90417941
 0.90621814 0.90723751 0.90621814 0.91029562]

mean value: 0.908256880733945

key: test_fscore
value: [0.86792453 0.92307692 0.89320388 0.91071429 0.89473684 0.94545455
 0.92727273 0.90909091 0.91743119 0.88073394]

mean value: 0.9069639782126365

key: train_fscore
value: [0.91335372 0.90723751 0.91131498 0.9057377  0.90946502 0.90368852
 0.9057377  0.90685773 0.9057377  0.91002045]

mean value: 0.9079151055700868

key: test_precision
value: [0.88461538 0.96       0.93877551 0.87931034 0.85       0.94545455
 0.92727273 0.90909091 0.92592593 0.88888889]

mean value: 0.9109334236280049

key: train_precision
value: [0.91428571 0.90816327 0.9122449  0.91134021 0.91891892 0.90740741
 0.90946502 0.90965092 0.90946502 0.91188525]

mean value: 0.9112826621141457

key: test_recall
value: [0.85185185 0.88888889 0.85185185 0.94444444 0.94444444 0.94545455
 0.92727273 0.90909091 0.90909091 0.87272727]

mean value: 0.9045117845117845

key: train_recall
value: [0.91242363 0.90631365 0.91038697 0.90020367 0.90020367 0.9
 0.90204082 0.90408163 0.90204082 0.90816327]

mean value: 0.9045858098840351

key: test_roc_auc
value: [0.87138047 0.92626263 0.8986532  0.90858586 0.89040404 0.94494949
 0.92659933 0.90824916 0.91750842 0.88080808]

mean value: 0.9073400673400673

key: train_roc_auc
value: [0.91335467 0.90723846 0.91131593 0.90622428 0.91030591 0.90417515
 0.90621389 0.9072343  0.90621389 0.91029345]

mean value: 0.9082569932249885

key: test_jcc
value: [0.76666667 0.85714286 0.80701754 0.83606557 0.80952381 0.89655172
 0.86440678 0.83333333 0.84745763 0.78688525]

mean value: 0.8305051161116039

key: train_jcc
value: [0.84052533 0.83022388 0.83707865 0.82771536 0.83396226 0.82429907
 0.82771536 0.82958801 0.82771536 0.83489681]

mean value: 0.8313720083087689

MCC on Blind test: 0.24

Accuracy on Blind test: 0.6

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.0153439  0.01146626 0.01095605 0.01241088 0.01217341 0.01237679
 0.01235223 0.01266718 0.01223803 0.01281548]

mean value: 0.012480020523071289

key: score_time
value: [0.03982353 0.01768589 0.01724744 0.01506829 0.01472926 0.01545072
 0.01538634 0.015414   0.01499319 0.01525664]

mean value: 0.018105530738830568

key: test_mcc
value: [0.81882759 0.92659933 0.946411   0.96329966 0.946411   0.91202529
 0.92719967 0.91202529 0.98181211 0.98181211]

mean value: 0.931642304301634

key: train_mcc
value: [0.95763309 0.95973466 0.95145652 0.95951143 0.95355358 0.95553835
 0.96550464 0.96550464 0.94927144 0.94763304]

mean value: 0.9565341373626831

key: test_accuracy
value: [0.90825688 0.96330275 0.97247706 0.98165138 0.97247706 0.95412844
 0.96330275 0.95412844 0.99082569 0.99082569]

mean value: 0.9651376146788991

key: train_accuracy
value: [0.97859327 0.97961264 0.97553517 0.97961264 0.97655454 0.9775739
 0.98267074 0.98267074 0.9745158  0.97349643]

mean value: 0.9780835881753314

key: test_fscore
value: [0.91071429 0.96296296 0.97297297 0.98148148 0.97297297 0.95652174
 0.96428571 0.95652174 0.99099099 0.99099099]

mean value: 0.9660415850633242

key: train_fscore
value: [0.97893681 0.97995992 0.97590361 0.97987928 0.97693079 0.9778672
 0.98281092 0.98281092 0.97477296 0.9739479 ]

mean value: 0.9783820308622914

key: test_precision
value: [0.87931034 0.96296296 0.94736842 0.98148148 0.94736842 0.91666667
 0.94736842 0.91666667 0.98214286 0.98214286]

mean value: 0.9463479100048973

key: train_precision
value: [0.96442688 0.96449704 0.96237624 0.96819085 0.96245059 0.96428571
 0.9739479  0.9739479  0.96407186 0.95669291]

mean value: 0.9654887879812519

key: test_recall
value: [0.94444444 0.96296296 1.         0.98148148 1.         1.
 0.98181818 1.         1.         1.        ]

mean value: 0.9870707070707071

key: train_recall
value: [0.99389002 0.99592668 0.9898167  0.99185336 0.99185336 0.99183673
 0.99183673 0.99183673 0.98571429 0.99183673]

mean value: 0.9916401346689389

key: test_roc_auc
value: [0.90858586 0.96329966 0.97272727 0.98164983 0.97272727 0.9537037
 0.96313131 0.9537037  0.99074074 0.99074074]

mean value: 0.9651010101010101

key: train_roc_auc
value: [0.97857766 0.97959599 0.9755206  0.97960015 0.97653893 0.97758843
 0.98268008 0.98268008 0.9745272  0.97351511]

mean value: 0.97808242237832

key: test_jcc
value: [0.83606557 0.92857143 0.94736842 0.96363636 0.94736842 0.91666667
 0.93103448 0.91666667 0.98214286 0.98214286]

mean value: 0.9351663738461216

key: train_jcc
value: [0.95874263 0.96070727 0.95294118 0.96055227 0.95490196 0.95669291
 0.96620278 0.96620278 0.9507874  0.94921875]

mean value: 0.9576949938828678

MCC on Blind test: 0.26

Accuracy on Blind test: 0.59

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.03312922 0.02710271 0.02765346 0.02737236 0.02725911 0.02756953
 0.0275867  0.02737093 0.02711725 0.02727389]

mean value: 0.027943515777587892

key: score_time
value: [0.01420808 0.01313591 0.01342154 0.01315141 0.01308537 0.01318836
 0.01315355 0.01331091 0.0128572  0.01323676]

mean value: 0.013274908065795898

key: test_mcc
value: [0.90841751 0.94509941 1.         0.96329966 0.96396098 0.98181211
 0.92905936 0.98181211 0.98181818 0.94635821]

mean value: 0.9601637548953743

key: train_mcc
value: [0.97968567 0.97560783 0.96946898 0.96946898 0.97556738 0.97351478
 0.97351478 0.97149021 0.96946962 0.97560844]

mean value: 0.9733396664150108

key: test_accuracy
value: [0.95412844 0.97247706 1.         0.98165138 0.98165138 0.99082569
 0.96330275 0.99082569 0.99082569 0.97247706]

mean value: 0.9798165137614679

key: train_accuracy
value: [0.98980632 0.98776758 0.98470948 0.98470948 0.98776758 0.98674822
 0.98674822 0.98572885 0.98470948 0.98776758]

mean value: 0.9866462793068298

key: test_fscore
value: [0.95412844 0.97196262 1.         0.98148148 0.98181818 0.99099099
 0.96491228 0.99099099 0.99082569 0.97345133]

mean value: 0.9800561998679824

key: train_fscore
value: [0.98987854 0.98785425 0.98480243 0.98480243 0.98782961 0.98677518
 0.98677518 0.98577236 0.98477157 0.98782961]

mean value: 0.9867091173333614

key: test_precision
value: [0.94545455 0.98113208 1.         0.98148148 0.96428571 0.98214286
 0.93220339 0.98214286 1.         0.94827586]

mean value: 0.9717118782878628

key: train_precision
value: [0.98390342 0.98189135 0.97983871 0.97983871 0.98383838 0.98377282
 0.98377282 0.98178138 0.97979798 0.98185484]

mean value: 0.9820290405776002

key: test_recall
value: [0.96296296 0.96296296 1.         0.98148148 1.         1.
 1.         1.         0.98181818 1.        ]

mean value: 0.9889225589225589

key: train_recall
value: [0.99592668 0.99389002 0.9898167  0.9898167  0.99185336 0.98979592
 0.98979592 0.98979592 0.98979592 0.99387755]

mean value: 0.9914364686811588

key: test_roc_auc
value: [0.95420875 0.97239057 1.         0.98164983 0.98181818 0.99074074
 0.96296296 0.99074074 0.99090909 0.97222222]

mean value: 0.9797643097643097

key: train_roc_auc
value: [0.98980007 0.98776134 0.98470427 0.98470427 0.98776341 0.98675132
 0.98675132 0.98573299 0.98471466 0.98777381]

mean value: 0.9866457458747246

key: test_jcc
value: [0.9122807  0.94545455 1.         0.96363636 0.96428571 0.98214286
 0.93220339 0.98214286 0.98181818 0.94827586]

mean value: 0.9612240473134379

key: train_jcc
value: [0.97995992 0.976      0.97005988 0.97005988 0.9759519  0.97389558
 0.97389558 0.97194389 0.97       0.9759519 ]

mean value: 0.9737718540368138

MCC on Blind test: 0.28

Accuracy on Blind test: 0.59

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [2.23760557 2.39744544 2.93186569 2.52583194 2.58781528 2.25252485
 2.65761113 2.15109515 1.78711081 2.11473489]

mean value: 2.3643640756607054

key: score_time
value: [0.01260877 0.01268816 0.01267433 0.01268578 0.01266146 0.01271725
 0.01503229 0.0170579  0.01278281 0.01270413]

mean value: 0.013361287117004395

key: test_mcc
value: [0.94511785 0.92724828 0.98181211 0.96329966 0.92724828 0.98181211
 0.91202529 0.98181211 0.96396098 0.96393713]

mean value: 0.954827381792807

key: train_mcc
value: [1.         1.         1.         1.         1.         0.99796334
 1.         0.99796334 0.99592252 0.99796334]

mean value: 0.9989812544162268

key: test_accuracy
value: [0.97247706 0.96330275 0.99082569 0.98165138 0.96330275 0.99082569
 0.95412844 0.99082569 0.98165138 0.98165138]

mean value: 0.9770642201834863

key: train_accuracy
value: [1.         1.         1.         1.         1.         0.99898063
 1.         0.99898063 0.99796126 0.99898063]

mean value: 0.9994903160040775

key: test_fscore
value: [0.97247706 0.96363636 0.99065421 0.98148148 0.96363636 0.99099099
 0.95652174 0.99099099 0.98148148 0.98214286]

mean value: 0.9774013538318624

key: train_fscore
value: [1.         1.         1.         1.         1.         0.99898063
 1.         0.99898063 0.99795918 0.99898063]

mean value: 0.9994901079697934

key: test_precision
value: [0.96363636 0.94642857 1.         0.98148148 0.94642857 0.98214286
 0.91666667 0.98214286 1.         0.96491228]

mean value: 0.9683839649629123

key: train_precision
value: [1.         1.         1.         1.         1.         0.99796334
 1.         0.99796334 0.99795918 0.99796334]

mean value: 0.9991849204040069

key: test_recall
value: [0.98148148 0.98148148 0.98148148 0.98148148 0.98148148 1.
 1.         1.         0.96363636 1.        ]

mean value: 0.9871043771043772

key: train_recall
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.99795918 1.        ]

mean value: 0.9997959183673469

key: test_roc_auc
value: [0.97255892 0.96346801 0.99074074 0.98164983 0.96346801 0.99074074
 0.9537037  0.99074074 0.98181818 0.98148148]

mean value: 0.977037037037037

key: train_roc_auc
value: [1.         1.         1.         1.         1.         0.99898167
 1.         0.99898167 0.99796126 0.99898167]

mean value: 0.9994906272081133

key: test_jcc
value: [0.94642857 0.92982456 0.98148148 0.96363636 0.92982456 0.98214286
 0.91666667 0.98214286 0.96363636 0.96491228]

mean value: 0.9560696564643933

key: train_jcc
value: [1.         1.         1.         1.         1.         0.99796334
 1.         0.99796334 0.99592668 0.99796334]

mean value: 0.9989816700610998

MCC on Blind test: 0.28

Accuracy on Blind test: 0.59

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.07054901 0.06181669 0.06172919 0.05932713 0.06351018 0.06178021
 0.0615108  0.0635767  0.0611248  0.06372452]

mean value: 0.06286492347717285

key: score_time
value: [0.00950432 0.00932312 0.00905943 0.00901723 0.00903702 0.00915956
 0.00907493 0.00908566 0.0090878  0.00900388]

mean value: 0.009135293960571288

key: test_mcc
value: [0.94509941 1.         0.96393713 0.98181211 0.9291517  0.94511785
 0.88989899 0.96393713 0.96393713 0.92905936]

mean value: 0.9511950806790457

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97247706 1.         0.98165138 0.99082569 0.96330275 0.97247706
 0.94495413 0.98165138 0.98165138 0.96330275]

mean value: 0.9752293577981651

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97196262 1.         0.98113208 0.99065421 0.96428571 0.97247706
 0.94545455 0.98214286 0.98214286 0.96491228]

mean value: 0.9755164216849517

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.98113208 1.         1.         1.         0.93103448 0.98148148
 0.94545455 0.96491228 0.96491228 0.93220339]

mean value: 0.9701130536400363

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96296296 1.         0.96296296 0.98148148 1.         0.96363636
 0.94545455 1.         1.         1.        ]

mean value: 0.9816498316498317

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97239057 1.         0.98148148 0.99074074 0.96363636 0.97255892
 0.94494949 0.98148148 0.98148148 0.96296296]

mean value: 0.9751683501683501

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94545455 1.         0.96296296 0.98148148 0.93103448 0.94642857
 0.89655172 0.96491228 0.96491228 0.93220339]

mean value: 0.952594171945813

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.2

Accuracy on Blind test: 0.56

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.13620734 0.13592768 0.14145327 0.13965559 0.13692927 0.13765979
 0.13613105 0.13678074 0.13775492 0.13754439]

mean value: 0.13760440349578856

key: score_time
value: [0.01817727 0.01823282 0.01932836 0.01829553 0.01836991 0.01836514
 0.01826692 0.01823115 0.01880527 0.01826   ]

mean value: 0.018433237075805665

key: test_mcc
value: [0.90841751 0.94509941 1.         0.98181211 0.98181818 1.
 0.94509941 0.96329966 1.         0.96393713]

mean value: 0.9689483423563136

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95412844 0.97247706 1.         0.99082569 0.99082569 1.
 0.97247706 0.98165138 1.         0.98165138]

mean value: 0.9844036697247707

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95412844 0.97196262 1.         0.99065421 0.99082569 1.
 0.97297297 0.98181818 1.         0.98214286]

mean value: 0.9844504962804286

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94545455 0.98113208 1.         1.         0.98181818 1.
 0.96428571 0.98181818 1.         0.96491228]

mean value: 0.9819420979550075

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96296296 0.96296296 1.         0.98148148 1.         1.
 0.98181818 0.98181818 1.         1.        ]

mean value: 0.987104377104377

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95420875 0.97239057 1.         0.99074074 0.99090909 1.
 0.97239057 0.98164983 1.         0.98148148]

mean value: 0.9843771043771044

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9122807  0.94545455 1.         0.98148148 0.98181818 1.
 0.94736842 0.96428571 1.         0.96491228]

mean value: 0.9697601326548695

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.56

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01190448 0.01207089 0.01192713 0.0119226  0.0119257  0.01208067
 0.01233387 0.01218677 0.01201677 0.01214457]

mean value: 0.01205134391784668

key: score_time
value: [0.00913978 0.00910997 0.00899887 0.00900722 0.00899005 0.00908947
 0.00914955 0.00911641 0.00894523 0.00904179]

mean value: 0.009058833122253418

key: test_mcc
value: [0.87171717 0.90838671 0.94511785 0.92724828 0.90841751 0.87167401
 0.90838671 0.90841751 0.87171717 0.92719967]

mean value: 0.9048282595925088

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93577982 0.95412844 0.97247706 0.96330275 0.95412844 0.93577982
 0.95412844 0.95412844 0.93577982 0.96330275]

mean value: 0.9522935779816514

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.93577982 0.95327103 0.97247706 0.96363636 0.95412844 0.93693694
 0.95495495 0.95412844 0.93577982 0.96428571]

mean value: 0.9525378575833003

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.92727273 0.96226415 0.96363636 0.94642857 0.94545455 0.92857143
 0.94642857 0.96296296 0.94444444 0.94736842]

mean value: 0.9474832187195643

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.94444444 0.94444444 0.98148148 0.98148148 0.96296296 0.94545455
 0.96363636 0.94545455 0.92727273 0.98181818]

mean value: 0.9578451178451178

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93585859 0.9540404  0.97255892 0.96346801 0.95420875 0.93569024
 0.9540404  0.95420875 0.93585859 0.96313131]

mean value: 0.9523063973063972

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.87931034 0.91071429 0.94642857 0.92982456 0.9122807  0.88135593
 0.9137931  0.9122807  0.87931034 0.93103448]

mean value: 0.9096333030120597

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.28

Accuracy on Blind test: 0.6

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [3.192981   3.21978426 3.26361322 3.27795982 3.19749761 3.2129035
 3.14740181 3.20146179 3.23462439 3.21413517]

mean value: 3.2162362575531005

key: score_time
value: [0.09524465 0.09526968 0.09523535 0.09465885 0.09512901 0.09531832
 0.09523821 0.09471774 0.09536004 0.09545851]

mean value: 0.09516303539276123

key: test_mcc
value: [0.96329966 1.         0.98181211 0.98181211 0.98181818 1.
 0.94511785 0.98181211 1.         0.96393713]

mean value: 0.9799609160184436

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98165138 1.         0.99082569 0.99082569 0.99082569 1.
 0.97247706 0.99082569 1.         0.98165138]

mean value: 0.989908256880734

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98148148 1.         0.99065421 0.99065421 0.99082569 1.
 0.97247706 0.99099099 1.         0.98214286]

mean value: 0.989922649312386

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.98148148 1.         1.         1.         0.98181818 1.
 0.98148148 0.98214286 1.         0.96491228]

mean value: 0.9891836282625757

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.98148148 1.         0.98148148 0.98148148 1.         1.
 0.96363636 1.         1.         1.        ]

mean value: 0.9908080808080808

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98164983 1.         0.99074074 0.99074074 0.99090909 1.
 0.97255892 0.99074074 1.         0.98148148]

mean value: 0.9898821548821548

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96363636 1.         0.98148148 0.98148148 0.98181818 1.
 0.94642857 0.98214286 1.         0.96491228]

mean value: 0.9801901217690692

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.25

Accuracy on Blind test: 0.57

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [1.15664482 1.25536323 1.23685193 1.19847178 1.22918534 1.21921778
 1.28991628 1.25609398 1.18945456 1.21153784]

mean value: 1.2242737531661987

key: score_time
value: [0.28573251 0.21750188 0.26001978 0.21304607 0.22797418 0.21436405
 0.26951408 0.28902078 0.24877381 0.24985695]

mean value: 0.2475804090499878

key: test_mcc
value: [0.94511785 1.         0.98181211 0.98181211 0.92724828 0.98181211
 0.90838671 0.98181211 0.96393713 0.96393713]

mean value: 0.9635875555795308

key: train_mcc
value: [0.98980835 0.98573085 0.98573085 0.98777573 0.98776757 0.98573091
 0.98777583 0.98980835 0.98573091 0.98776757]

mean value: 0.9873626926348411

key: test_accuracy
value: [0.97247706 1.         0.99082569 0.99082569 0.96330275 0.99082569
 0.95412844 0.99082569 0.98165138 0.98165138]

mean value: 0.981651376146789

key: train_accuracy
value: [0.99490316 0.99286442 0.99286442 0.99388379 0.99388379 0.99286442
 0.99388379 0.99490316 0.99286442 0.99388379]

mean value: 0.9936799184505607

key: test_fscore
value: [0.97247706 1.         0.99065421 0.99065421 0.96363636 0.99099099
 0.95495495 0.99099099 0.98214286 0.98214286]

mean value: 0.9818644490294152

key: train_fscore
value: [0.99491353 0.99287894 0.99287894 0.99390244 0.99389002 0.99286442
 0.99389002 0.99489275 0.99286442 0.99387755]

mean value: 0.993685304063256

key: test_precision
value: [0.96363636 1.         1.         1.         0.94642857 0.98214286
 0.94642857 0.98214286 0.96491228 0.96491228]

mean value: 0.975060378218273

key: train_precision
value: [0.99390244 0.99186992 0.99186992 0.99188641 0.99389002 0.99185336
 0.99186992 0.99591002 0.99185336 0.99387755]

mean value: 0.992878291767276

key: test_recall
value: [0.98148148 1.         0.98148148 0.98148148 0.98148148 1.
 0.96363636 1.         1.         1.        ]

mean value: 0.9889562289562289

key: train_recall
value: [0.99592668 0.99389002 0.99389002 0.99592668 0.99389002 0.99387755
 0.99591837 0.99387755 0.99387755 0.99387755]

mean value: 0.9944951993017166

key: test_roc_auc
value: [0.97255892 1.         0.99074074 0.99074074 0.96346801 0.99074074
 0.9540404  0.99074074 0.98148148 0.98148148]

mean value: 0.9815993265993266

key: train_roc_auc
value: [0.99490212 0.99286338 0.99286338 0.99388171 0.99388379 0.99286546
 0.99388586 0.99490212 0.99286546 0.99388379]

mean value: 0.9936797040608504

key: test_jcc
value: [0.94642857 1.         0.98148148 0.98148148 0.92982456 0.98214286
 0.9137931  0.98214286 0.96491228 0.96491228]

mean value: 0.9647119474932542

key: train_jcc
value: [0.98987854 0.98585859 0.98585859 0.98787879 0.98785425 0.98582996
 0.98785425 0.9898374  0.98582996 0.98782961]

mean value: 0.9874509936137159

MCC on Blind test: 0.3

Accuracy on Blind test: 0.59

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02857256 0.01666284 0.01697707 0.01707339 0.01707959 0.01700878
 0.01709127 0.01708722 0.01719165 0.01719737]

mean value: 0.018194174766540526

key: score_time
value: [0.01237464 0.01241755 0.01254058 0.01241088 0.01241207 0.01243305
 0.01244402 0.01243639 0.01238298 0.01241779]

mean value: 0.012426996231079101

key: test_mcc
value: [0.74351235 0.85541029 0.80136396 0.81882759 0.78478168 0.88989899
 0.85319865 0.81649832 0.83501684 0.76161616]

mean value: 0.8160124823784499

key: train_mcc
value: [0.82670934 0.81447691 0.82263186 0.81249922 0.82076536 0.80838391
 0.81246155 0.81448891 0.81246155 0.82059712]

mean value: 0.816547573939551

key: test_accuracy
value: [0.87155963 0.9266055  0.89908257 0.90825688 0.88990826 0.94495413
 0.9266055  0.90825688 0.91743119 0.88073394]

mean value: 0.9073394495412844

key: train_accuracy
value: [0.91335372 0.90723751 0.91131498 0.90621814 0.91029562 0.90417941
 0.90621814 0.90723751 0.90621814 0.91029562]

mean value: 0.908256880733945

key: test_fscore
value: [0.86792453 0.92307692 0.89320388 0.91071429 0.89473684 0.94545455
 0.92727273 0.90909091 0.91743119 0.88073394]

mean value: 0.9069639782126365

key: train_fscore
value: [0.91335372 0.90723751 0.91131498 0.9057377  0.90946502 0.90368852
 0.9057377  0.90685773 0.9057377  0.91002045]

mean value: 0.9079151055700868

key: test_precision
value: [0.88461538 0.96       0.93877551 0.87931034 0.85       0.94545455
 0.92727273 0.90909091 0.92592593 0.88888889]

mean value: 0.9109334236280049

key: train_precision
value: [0.91428571 0.90816327 0.9122449  0.91134021 0.91891892 0.90740741
 0.90946502 0.90965092 0.90946502 0.91188525]

mean value: 0.9112826621141457

key: test_recall
value: [0.85185185 0.88888889 0.85185185 0.94444444 0.94444444 0.94545455
 0.92727273 0.90909091 0.90909091 0.87272727]

mean value: 0.9045117845117845

key: train_recall
value: [0.91242363 0.90631365 0.91038697 0.90020367 0.90020367 0.9
 0.90204082 0.90408163 0.90204082 0.90816327]

mean value: 0.9045858098840351

key: test_roc_auc
value: [0.87138047 0.92626263 0.8986532  0.90858586 0.89040404 0.94494949
 0.92659933 0.90824916 0.91750842 0.88080808]

mean value: 0.9073400673400673

key: train_roc_auc
value: [0.91335467 0.90723846 0.91131593 0.90622428 0.91030591 0.90417515
 0.90621389 0.9072343  0.90621389 0.91029345]

mean value: 0.9082569932249885

key: test_jcc
value: [0.76666667 0.85714286 0.80701754 0.83606557 0.80952381 0.89655172
 0.86440678 0.83333333 0.84745763 0.78688525]

mean value: 0.8305051161116039

key: train_jcc
value: [0.84052533 0.83022388 0.83707865 0.82771536 0.83396226 0.82429907
 0.82771536 0.82958801 0.82771536 0.83489681]

mean value: 0.8313720083087689

MCC on Blind test: 0.24

Accuracy on Blind test: 0.6

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.13928676 0.13060784 0.21524143 0.14963627 0.15037847 0.1270678
 0.13197303 0.13237286 0.12784719 0.12808561]

mean value: 0.1432497262954712

key: score_time
value: [0.01138091 0.01127362 0.01427794 0.01234388 0.01148152 0.01167202
 0.01126146 0.01150727 0.01140141 0.011204  ]

mean value: 0.011780405044555664

key: test_mcc
value: [0.90841751 1.         0.96393713 0.98181211 0.96396098 0.96329966
 0.94511785 0.96329966 0.98181211 0.96393713]

mean value: 0.9635594149827394

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95412844 1.         0.98165138 0.99082569 0.98165138 0.98165138
 0.97247706 0.98165138 0.99082569 0.98165138]

mean value: 0.981651376146789

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95412844 1.         0.98113208 0.99065421 0.98181818 0.98181818
 0.97247706 0.98181818 0.99099099 0.98214286]

mean value: 0.9816980179254724

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94545455 1.         1.         1.         0.96428571 0.98181818
 0.98148148 0.98181818 0.98214286 0.96491228]

mean value: 0.9801913242702717

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96296296 1.         0.96296296 0.98148148 1.         0.98181818
 0.96363636 0.98181818 1.         1.        ]

mean value: 0.9834680134680135

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95420875 1.         0.98148148 0.99074074 0.98181818 0.98164983
 0.97255892 0.98164983 0.99074074 0.98148148]

mean value: 0.9816329966329966

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9122807  1.         0.96296296 0.98148148 0.96428571 0.96428571
 0.94642857 0.96428571 0.98214286 0.96491228]

mean value: 0.9643065998329157

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.27

Accuracy on Blind test: 0.59

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.05295801 0.08014059 0.09963298 0.083951   0.06942058 0.08701682
 0.0714407  0.08109713 0.0660603  0.09231591]

mean value: 0.0784034013748169

key: score_time
value: [0.01235604 0.01253247 0.03159952 0.02604461 0.01255369 0.02007866
 0.01244569 0.01251531 0.0123868  0.02083731]

mean value: 0.017335009574890137

key: test_mcc
value: [0.946411   0.89544301 0.946411   0.98181818 0.98181818 0.96393713
 0.89223482 0.846254   0.96393713 0.91202529]

mean value: 0.9330289738014781

key: train_mcc
value: [0.9837635  0.97773807 0.9857799  0.97974261 0.9837635  0.9857802
 0.97981668 0.9817516  0.9837639  0.9817516 ]

mean value: 0.9823651560326023

key: test_accuracy
value: [0.97247706 0.94495413 0.97247706 0.99082569 0.99082569 0.98165138
 0.94495413 0.91743119 0.98165138 0.95412844]

mean value: 0.9651376146788991

key: train_accuracy
value: [0.99184506 0.98878695 0.99286442 0.98980632 0.99184506 0.99286442
 0.98980632 0.99082569 0.99184506 0.99082569]

mean value: 0.9911314984709481

key: test_fscore
value: [0.97297297 0.94736842 0.97297297 0.99082569 0.99082569 0.98214286
 0.94736842 0.92436975 0.98214286 0.95652174]

mean value: 0.9667511365513307

key: train_fscore
value: [0.99190283 0.9889001  0.9929078  0.98989899 0.99190283 0.9928934
 0.98989899 0.99088146 0.99188641 0.99088146]

mean value: 0.9911954278825454

key: test_precision
value: [0.94736842 0.9        0.94736842 0.98181818 0.98181818 0.96491228
 0.91525424 0.859375   0.96491228 0.91666667]

mean value: 0.9379493671099938

key: train_precision
value: [0.98591549 0.98       0.98790323 0.98196393 0.98591549 0.98787879
 0.98       0.98390342 0.9858871  0.98390342]

mean value: 0.9843270865276915

key: test_recall
value: [1.         1.         1.         1.         1.         1.
 0.98181818 1.         1.         1.        ]

mean value: 0.9981818181818182

key: train_recall
value: [0.99796334 0.99796334 0.99796334 0.99796334 0.99796334 0.99795918
 1.         0.99795918 0.99795918 0.99795918]

mean value: 0.9981653435304876

key: test_roc_auc
value: [0.97272727 0.94545455 0.97272727 0.99090909 0.99090909 0.98148148
 0.94461279 0.91666667 0.98148148 0.9537037 ]

mean value: 0.9650673400673401

key: train_roc_auc
value: [0.99183881 0.98877759 0.99285922 0.989798   0.99183881 0.99286961
 0.9898167  0.99083295 0.99185128 0.99083295]

mean value: 0.9911315931667983

key: test_jcc
value: [0.94736842 0.9        0.94736842 0.98181818 0.98181818 0.96491228
 0.9        0.859375   0.96491228 0.91666667]

mean value: 0.9364239433811802

key: train_jcc
value: [0.98393574 0.97804391 0.98591549 0.98       0.98393574 0.9858871
 0.98       0.98192771 0.98390342 0.98192771]

mean value: 0.9825476830061249

MCC on Blind test: 0.21

Accuracy on Blind test: 0.57

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02079773 0.01609492 0.01598954 0.01628113 0.0189333  0.01634192
 0.01601601 0.01622939 0.01595616 0.01598763]

mean value: 0.016862773895263673

key: score_time
value: [0.01259875 0.01211143 0.01218486 0.0121243  0.01218343 0.01213431
 0.01215887 0.01219344 0.01213384 0.01207805]

mean value: 0.012190127372741699

key: test_mcc
value: [0.76248469 0.78039748 0.83496131 0.76161616 0.71100747 0.79824861
 0.74368478 0.78039748 0.79824861 0.78039748]

mean value: 0.7751444102497523

key: train_mcc
value: [0.77984039 0.76966498 0.76962372 0.77777963 0.79002606 0.77574619
 0.77370215 0.76962372 0.77373792 0.77371595]

mean value: 0.7753460692713874

key: test_accuracy
value: [0.88073394 0.88990826 0.91743119 0.88073394 0.85321101 0.89908257
 0.87155963 0.88990826 0.89908257 0.88990826]

mean value: 0.8871559633027524

key: train_accuracy
value: [0.88990826 0.88481142 0.88481142 0.88888889 0.8950051  0.88786952
 0.88685015 0.88481142 0.88685015 0.88685015]

mean value: 0.8876656472986748

key: test_fscore
value: [0.87619048 0.89090909 0.91588785 0.88073394 0.85964912 0.9009009
 0.87037037 0.88888889 0.9009009  0.88888889]

mean value: 0.8873320435277953

key: train_fscore
value: [0.89046653 0.88433982 0.88504578 0.88888889 0.8947906  0.88798371
 0.88685015 0.8845761  0.88615385 0.88708037]

mean value: 0.8876175787042375

key: test_precision
value: [0.90196078 0.875      0.9245283  0.87272727 0.81666667 0.89285714
 0.88679245 0.90566038 0.89285714 0.90566038]

mean value: 0.8874710518855913

key: train_precision
value: [0.88686869 0.88888889 0.88414634 0.88979592 0.89754098 0.88617886
 0.88594705 0.88548057 0.89072165 0.88438134]

mean value: 0.8879950288650756

key: test_recall
value: [0.85185185 0.90740741 0.90740741 0.88888889 0.90740741 0.90909091
 0.85454545 0.87272727 0.90909091 0.87272727]

mean value: 0.8881144781144781

key: train_recall
value: [0.89409369 0.87983707 0.88594705 0.88798371 0.89205703 0.88979592
 0.8877551  0.88367347 0.88163265 0.88979592]

mean value: 0.8872571594829378

key: test_roc_auc
value: [0.88047138 0.89006734 0.91734007 0.88080808 0.8537037  0.8989899
 0.87171717 0.89006734 0.8989899  0.89006734]

mean value: 0.8872222222222222

key: train_roc_auc
value: [0.88990399 0.88481649 0.88481026 0.88888981 0.89500811 0.88787148
 0.88685107 0.88481026 0.88684484 0.88685315]

mean value: 0.8876659462155534

key: test_jcc
value: [0.77966102 0.80327869 0.84482759 0.78688525 0.75384615 0.81967213
 0.7704918  0.8        0.81967213 0.8       ]

mean value: 0.7978334757002203

key: train_jcc
value: [0.80255941 0.79266055 0.79379562 0.8        0.80961183 0.7985348
 0.7967033  0.79304029 0.79558011 0.79707495]

mean value: 0.7979560868903866

MCC on Blind test: 0.37

Accuracy on Blind test: 0.66

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.03137422 0.03929567 0.02904129 0.03385782 0.03152227 0.0290134
 0.02575612 0.02675271 0.035393   0.02952981]

mean value: 0.03115363121032715

key: score_time
value: [0.01219153 0.01214862 0.01207137 0.01212645 0.01213336 0.01214814
 0.01212502 0.01226616 0.0121429  0.01237202]

mean value: 0.012172555923461914

key: test_mcc
value: [0.90841751 0.92724828 0.96393713 0.96393713 0.946411   0.96393713
 0.92905936 0.98181211 1.         0.96393713]

mean value: 0.9548696778106656

key: train_mcc
value: [0.99390234 0.9778193  0.9837639  0.9406024  0.99187796 0.98166983
 0.98170255 0.98980839 0.95806676 0.9837635 ]

mean value: 0.978297693597624

key: test_accuracy
value: [0.95412844 0.96330275 0.98165138 0.98165138 0.97247706 0.98165138
 0.96330275 0.99082569 1.         0.98165138]

mean value: 0.9770642201834863

key: train_accuracy
value: [0.9969419  0.98878695 0.99184506 0.96941896 0.99592253 0.99082569
 0.99082569 0.99490316 0.97859327 0.99184506]

mean value: 0.9889908256880734

key: test_fscore
value: [0.95412844 0.96363636 0.98113208 0.98113208 0.97297297 0.98214286
 0.96491228 0.99099099 1.         0.98214286]

mean value: 0.9773190913898165

key: train_fscore
value: [0.99695431 0.98892246 0.99180328 0.96848739 0.9959432  0.99084435
 0.99086294 0.99490316 0.97902098 0.99178645]

mean value: 0.9889528535316982

key: test_precision
value: [0.94545455 0.94642857 1.         1.         0.94736842 0.96491228
 0.93220339 0.98214286 1.         0.96491228]

mean value: 0.9683422346312622

key: train_precision
value: [0.99392713 0.97808765 0.99793814 1.         0.99191919 0.98782961
 0.98585859 0.99389002 0.95890411 0.99793388]

mean value: 0.9886288325873761

key: test_recall
value: [0.96296296 0.98148148 0.96296296 0.96296296 1.         1.
 1.         1.         1.         1.        ]

mean value: 0.987037037037037

key: train_recall
value: [1.         1.         0.98574338 0.9389002  1.         0.99387755
 0.99591837 0.99591837 1.         0.98571429]

mean value: 0.9896072155949956

key: test_roc_auc
value: [0.95420875 0.96346801 0.98148148 0.98148148 0.97272727 0.98148148
 0.96296296 0.99074074 1.         0.98148148]

mean value: 0.977003367003367

key: train_roc_auc
value: [0.99693878 0.98877551 0.99185128 0.9694501  0.99591837 0.9908288
 0.99083087 0.99490419 0.97861507 0.99183881]

mean value: 0.9889951785194729

key: test_jcc
value: [0.9122807  0.92982456 0.96296296 0.96296296 0.94736842 0.96491228
 0.93220339 0.98214286 1.         0.96491228]

mean value: 0.9559570418513327

key: train_jcc
value: [0.99392713 0.97808765 0.98373984 0.9389002  0.99191919 0.98185484
 0.98189135 0.98985801 0.95890411 0.98370672]

mean value: 0.9782789037427249

MCC on Blind test: 0.23

Accuracy on Blind test: 0.57

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.02381825 0.02869153 0.02777767 0.02622557 0.02247834 0.02444983
 0.0302763  0.02613544 0.02430272 0.02424407]

mean value: 0.025839972496032714

key: score_time
value: [0.01220822 0.01211882 0.01219845 0.0121603  0.01212859 0.01205921
 0.01216602 0.01215434 0.01211047 0.01222229]

mean value: 0.012152671813964844

key: test_mcc
value: [0.7935982  0.92719967 0.8789643  0.98181818 0.90838671 0.96393713
 0.92905936 0.96329966 1.         0.96393713]

mean value: 0.9310200334445711

key: train_mcc
value: [0.81490143 0.9677754  0.94445227 0.9836901  0.94386773 0.96789632
 0.99185333 0.98372267 0.9367601  0.98175107]

mean value: 0.951667042747277

key: test_accuracy
value: [0.88990826 0.96330275 0.93577982 0.99082569 0.95412844 0.98165138
 0.96330275 0.98165138 1.         0.98165138]

mean value: 0.9642201834862385

key: train_accuracy
value: [0.89908257 0.98369011 0.9714577  0.99184506 0.9714577  0.98369011
 0.99592253 0.99184506 0.96738022 0.99082569]

mean value: 0.9747196738022426

key: test_fscore
value: [0.89830508 0.96226415 0.93913043 0.99082569 0.95327103 0.98214286
 0.96491228 0.98181818 1.         0.98214286]

mean value: 0.9654812563388195

key: train_fscore
value: [0.90841813 0.98347107 0.97227723 0.99185336 0.97083333 0.98393574
 0.99592668 0.99180328 0.96837945 0.99075026]

mean value: 0.9757648532767356

key: test_precision
value: [0.828125   0.98076923 0.8852459  0.98181818 0.96226415 0.96491228
 0.93220339 0.98181818 1.         0.96491228]

mean value: 0.9482068598222352

key: train_precision
value: [0.83220339 0.99790356 0.9460501  0.99185336 0.99360341 0.96837945
 0.99390244 0.99588477 0.93869732 0.99792961]

mean value: 0.9656407406073759

key: test_recall
value: [0.98148148 0.94444444 1.         1.         0.94444444 1.
 1.         0.98181818 1.         1.        ]

mean value: 0.9852188552188552

key: train_recall
value: [1.         0.9694501  1.         0.99185336 0.9490835  1.
 0.99795918 0.9877551  1.         0.98367347]

mean value: 0.9879774720478822

key: test_roc_auc
value: [0.89074074 0.96313131 0.93636364 0.99090909 0.9540404  0.98148148
 0.96296296 0.98164983 1.         0.98148148]

mean value: 0.9642760942760943

key: train_roc_auc
value: [0.89897959 0.98370464 0.97142857 0.99184505 0.97148053 0.98370672
 0.9959246  0.99184089 0.96741344 0.9908184 ]

mean value: 0.9747142441497983

key: test_jcc
value: [0.81538462 0.92727273 0.8852459  0.98181818 0.91071429 0.96491228
 0.93220339 0.96428571 1.         0.96491228]

mean value: 0.9346749377348886

key: train_jcc
value: [0.83220339 0.96747967 0.9460501  0.98383838 0.94331984 0.96837945
 0.99188641 0.98373984 0.93869732 0.98167006]

mean value: 0.9537264455743892

MCC on Blind test: 0.33

Accuracy on Blind test: 0.62

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.51863694 0.50481296 0.51417375 0.50742078 0.510185   0.50463581
 0.50489211 0.50328565 0.51146102 0.50386643]

mean value: 0.5083370447158814

key: score_time
value: [0.01617646 0.01697731 0.01649475 0.01617336 0.01631141 0.0161674
 0.01612544 0.01631689 0.01627827 0.01620293]

mean value: 0.01632242202758789

key: test_mcc
value: [0.89053558 0.98181211 0.98181211 0.98181211 0.98181818 0.98181211
 0.94509941 0.98181211 1.         0.98181211]

mean value: 0.9708325861574671

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94495413 0.99082569 0.99082569 0.99082569 0.99082569 0.99082569
 0.97247706 0.99082569 1.         0.99082569]

mean value: 0.9853211009174312

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94545455 0.99065421 0.99065421 0.99065421 0.99082569 0.99099099
 0.97297297 0.99099099 1.         0.99099099]

mean value: 0.9854188796296316

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.92857143 1.         1.         1.         0.98181818 0.98214286
 0.96428571 0.98214286 1.         0.98214286]

mean value: 0.9821103896103895

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96296296 0.98148148 0.98148148 0.98148148 1.         1.
 0.98181818 1.         1.         1.        ]

mean value: 0.988922558922559

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94511785 0.99074074 0.99074074 0.99074074 0.99090909 0.99074074
 0.97239057 0.99074074 1.         0.99074074]

mean value: 0.9852861952861953

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.89655172 0.98148148 0.98148148 0.98148148 0.98181818 0.98214286
 0.94736842 0.98214286 1.         0.98214286]

mean value: 0.971661134288176

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.3

Accuracy on Blind test: 0.6

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.21921444 0.22345185 0.21574569 0.2206254  0.22202802 0.13987017
 0.22048545 0.23433447 0.23488665 0.21739197]

mean value: 0.21480340957641603

key: score_time
value: [0.03116179 0.02156115 0.04347515 0.02509618 0.03190017 0.02411628
 0.04481173 0.02482653 0.038275   0.03011703]

mean value: 0.03153409957885742

key: test_mcc
value: [0.92659933 0.98181211 0.94509941 0.98181211 0.96396098 0.94511785
 0.94511785 0.94511785 0.98181211 0.98181211]

mean value: 0.9598261714386186

key: train_mcc
value: [0.9918781  0.99390241 0.99388586 0.99592252 0.99390241 0.99185326
 0.99592252 0.99187796 0.99796333 0.99388584]

mean value: 0.9940994227573303

key: test_accuracy
value: [0.96330275 0.99082569 0.97247706 0.99082569 0.98165138 0.97247706
 0.97247706 0.97247706 0.99082569 0.99082569]

mean value: 0.9798165137614679

key: train_accuracy
value: [0.99592253 0.9969419  0.9969419  0.99796126 0.9969419  0.99592253
 0.99796126 0.99592253 0.99898063 0.9969419 ]

mean value: 0.9970438328236494

key: test_fscore
value: [0.96296296 0.99065421 0.97196262 0.99065421 0.98181818 0.97247706
 0.97247706 0.97247706 0.99099099 0.99099099]

mean value: 0.9797465347461061

key: train_fscore
value: [0.99591002 0.99693565 0.9969419  0.99796334 0.99693565 0.99591002
 0.99795918 0.99590164 0.99897855 0.99693565]

mean value: 0.9970371595467664

key: test_precision
value: [0.96296296 1.         0.98113208 1.         0.96428571 0.98148148
 0.98148148 0.98148148 0.98214286 0.98214286]

mean value: 0.9817110911450534

key: train_precision
value: [1.         1.         0.99795918 0.99796334 1.         0.99795082
 0.99795918 1.         1.         0.99795501]

mean value: 0.9989787537366218

key: test_recall
value: [0.96296296 0.98148148 0.96296296 0.98148148 1.         0.96363636
 0.96363636 0.96363636 1.         1.        ]

mean value: 0.977979797979798

key: train_recall
value: [0.99185336 0.99389002 0.99592668 0.99796334 0.99389002 0.99387755
 0.99795918 0.99183673 0.99795918 0.99591837]

mean value: 0.9951074441996758

key: test_roc_auc
value: [0.96329966 0.99074074 0.97239057 0.99074074 0.98181818 0.97255892
 0.97255892 0.97255892 0.99074074 0.99074074]

mean value: 0.9798148148148148

key: train_roc_auc
value: [0.99592668 0.99694501 0.99694293 0.99796126 0.99694501 0.99592045
 0.99796126 0.99591837 0.99897959 0.99694085]

mean value: 0.9970441414855148

key: test_jcc
value: [0.92857143 0.98148148 0.94545455 0.98148148 0.96428571 0.94642857
 0.94642857 0.94642857 0.98214286 0.98214286]

mean value: 0.960484607984608

key: train_jcc
value: [0.99185336 0.99389002 0.99390244 0.99593496 0.99389002 0.99185336
 0.99592668 0.99183673 0.99795918 0.99389002]

mean value: 0.9940936779063123

MCC on Blind test: 0.21

Accuracy on Blind test: 0.56

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.45224738 0.63778901 0.5324204  0.5638485  0.78409576 0.54516888
 0.51332784 0.58729959 0.45964193 0.45682168]

mean value: 0.5532660961151123

key: score_time
value: [0.02797675 0.04070687 0.04118657 0.0417459  0.0412612  0.02333283
 0.02900791 0.04153872 0.02351713 0.03968668]

mean value: 0.03499605655670166

key: test_mcc
value: [0.87171717 0.90841751 0.98181818 0.96329966 0.96396098 0.96393713
 0.92719967 0.92719967 0.96329966 0.96393713]

mean value: 0.9434786761144899

key: train_mcc
value: [0.99593079 1.         0.99593079 0.99593079 0.99796333 0.99593082
 0.99593082 0.99796334 0.99593082 0.99593082]

mean value: 0.9967442309031642

key: test_accuracy
value: [0.93577982 0.95412844 0.99082569 0.98165138 0.98165138 0.98165138
 0.96330275 0.96330275 0.98165138 0.98165138]

mean value: 0.9715596330275229

key: train_accuracy
value: [0.99796126 1.         0.99796126 0.99796126 0.99898063 0.99796126
 0.99796126 0.99898063 0.99796126 0.99796126]

mean value: 0.9983690112130479

key: test_fscore
value: [0.93577982 0.95412844 0.99082569 0.98148148 0.98181818 0.98214286
 0.96428571 0.96428571 0.98181818 0.98214286]

mean value: 0.9718708932929117

key: train_fscore
value: [0.99796748 1.         0.99796748 0.99796748 0.99898271 0.99796334
 0.99796334 0.99898063 0.99796334 0.99796334]

mean value: 0.9983719137523378

key: test_precision
value: [0.92727273 0.94545455 0.98181818 0.98148148 0.96428571 0.96491228
 0.94736842 0.94736842 0.98181818 0.96491228]

mean value: 0.9606692235639605

key: train_precision
value: [0.9959432  1.         0.9959432  0.9959432  0.99796748 0.99593496
 0.99593496 0.99796334 0.99593496 0.99593496]

mean value: 0.9967500271799833

key: test_recall
value: [0.94444444 0.96296296 1.         0.98148148 1.         1.
 0.98181818 0.98181818 0.98181818 1.        ]

mean value: 0.9834343434343434

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93585859 0.95420875 0.99090909 0.98164983 0.98181818 0.98148148
 0.96313131 0.96313131 0.98164983 0.98148148]

mean value: 0.9715319865319865

key: train_roc_auc
value: [0.99795918 1.         0.99795918 0.99795918 0.99897959 0.99796334
 0.99796334 0.99898167 0.99796334 0.99796334]

mean value: 0.9983692173407042

key: test_jcc
value: [0.87931034 0.9122807  0.98181818 0.96363636 0.96428571 0.96491228
 0.93103448 0.93103448 0.96428571 0.96491228]

mean value: 0.9457510547528696

key: train_jcc
value: [0.9959432  1.         0.9959432  0.9959432  0.99796748 0.99593496
 0.99593496 0.99796334 0.99593496 0.99593496]

mean value: 0.9967500271799833

MCC on Blind test: 0.28

Accuracy on Blind test: 0.59

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [2.25848699 2.24054694 2.27098012 2.30772567 2.28769517 2.29379869
 2.27469635 2.2514286  2.25546622 2.24115229]

mean value: 2.268197703361511

key: score_time
value: [0.00967145 0.00950313 0.00958705 0.01027203 0.00979424 0.01037478
 0.00971627 0.00944853 0.00973773 0.00967526]

mean value: 0.009778046607971191

key: test_mcc
value: [0.92659933 1.         0.96393713 0.98181211 0.96396098 0.94511785
 0.94511785 0.94511785 0.98181211 0.94635821]

mean value: 0.9599833416841214

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96330275 1.         0.98165138 0.99082569 0.98165138 0.97247706
 0.97247706 0.97247706 0.99082569 0.97247706]

mean value: 0.9798165137614679

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96296296 1.         0.98113208 0.99065421 0.98181818 0.97247706
 0.97247706 0.97247706 0.99099099 0.97345133]

mean value: 0.979844093694549

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96296296 1.         1.         1.         0.96428571 0.98148148
 0.98148148 0.98148148 0.98214286 0.94827586]

mean value: 0.9802111840904945

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96296296 1.         0.96296296 0.98148148 1.         0.96363636
 0.96363636 0.96363636 1.         1.        ]

mean value: 0.9798316498316498

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96329966 1.         0.98148148 0.99074074 0.98181818 0.97255892
 0.97255892 0.97255892 0.99074074 0.97222222]

mean value: 0.9797979797979798

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.92857143 1.         0.96296296 0.98148148 0.96428571 0.94642857
 0.94642857 0.94642857 0.98214286 0.94827586]

mean value: 0.9607006020799124

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.28

Accuracy on Blind test: 0.59

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.04199314 0.04448581 0.05012202 0.05785346 0.04534864 0.04628587
 0.04821253 0.04629254 0.04560542 0.06620932]

mean value: 0.04924087524414063

key: score_time
value: [0.02291536 0.0129447  0.01291752 0.02138734 0.01435804 0.01392055
 0.02247977 0.01402569 0.01400256 0.02144146]

mean value: 0.01703929901123047

key: test_mcc
value: [0.98181211 1.         1.         1.         1.         1.
 1.         1.         0.98181818 1.        ]

mean value: 0.9963630295440247

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.99082569 1.         1.         1.         1.         1.
 1.         1.         0.99082569 1.        ]

mean value: 0.998165137614679

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.99065421 1.         1.         1.         1.         1.
 1.         1.         0.99082569 1.        ]

mean value: 0.9981479893680871

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.98148148 1.         1.         1.         1.         1.
 1.         1.         0.98181818 1.        ]

mean value: 0.9963299663299663

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.99074074 1.         1.         1.         1.         1.
 1.         1.         0.99090909 1.        ]

mean value: 0.9981649831649831

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.98148148 1.         1.         1.         1.         1.
 1.         1.         0.98181818 1.        ]

mean value: 0.9963299663299663

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.0

Accuracy on Blind test: 0.51

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.035707   0.04087877 0.04267907 0.0428102  0.0426352  0.04230046
 0.04302907 0.04247069 0.04260826 0.0430162 ]

mean value: 0.04181349277496338

key: score_time
value: [0.01995635 0.01925468 0.01918221 0.01924872 0.01916957 0.01920724
 0.01924849 0.01924205 0.01930523 0.01914072]

mean value: 0.019295525550842286

key: test_mcc
value: [0.946411   0.90967354 0.98181818 0.98181818 0.96396098 1.
 0.87869377 0.96393713 1.         0.91202529]

mean value: 0.9538338072232938

key: train_mcc
value: [0.98181631 0.9778193  0.9737407  0.9738378  0.9738378  0.97383919
 0.9778203  0.97383919 0.97383919 0.97981668]

mean value: 0.9760206475633355

key: test_accuracy
value: [0.97247706 0.95412844 0.99082569 0.99082569 0.98165138 1.
 0.93577982 0.98165138 1.         0.95412844]

mean value: 0.9761467889908257

key: train_accuracy
value: [0.99082569 0.98878695 0.98674822 0.98674822 0.98674822 0.98674822
 0.98878695 0.98674822 0.98674822 0.98980632]

mean value: 0.9878695208970438

key: test_fscore
value: [0.97297297 0.95495495 0.99082569 0.99082569 0.98181818 1.
 0.94017094 0.98214286 1.         0.95652174]

mean value: 0.9770233022337131

key: train_fscore
value: [0.99091826 0.98892246 0.98690836 0.98693467 0.98693467 0.98690836
 0.9889001  0.98690836 0.98690836 0.98989899]

mean value: 0.9880142593158917

key: test_precision
value: [0.94736842 0.92982456 0.98181818 0.98181818 0.96428571 1.
 0.88709677 0.96491228 1.         0.91666667]

mean value: 0.9573790781940188

key: train_precision
value: [0.982      0.97808765 0.97609562 0.97420635 0.97420635 0.97415507
 0.97804391 0.97415507 0.97415507 0.98      ]

mean value: 0.9765105086268133

key: test_recall
value: [1.         0.98148148 1.         1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.9981481481481481

key: train_recall
value: [1.         1.         0.99796334 1.         1.         1.
 1.         1.         1.         1.        ]

mean value: 0.99979633401222

key: test_roc_auc
value: [0.97272727 0.9543771  0.99090909 0.99090909 0.98181818 1.
 0.93518519 0.98148148 1.         0.9537037 ]

mean value: 0.976111111111111

key: train_roc_auc
value: [0.99081633 0.98877551 0.98673677 0.98673469 0.98673469 0.98676171
 0.98879837 0.98676171 0.98676171 0.9898167 ]

mean value: 0.9878698200257701

key: test_jcc
value: [0.94736842 0.9137931  0.98181818 0.98181818 0.96428571 1.
 0.88709677 0.96491228 1.         0.91666667]

mean value: 0.9557759323984955

key: train_jcc
value: [0.982      0.97808765 0.97415507 0.97420635 0.97420635 0.97415507
 0.97804391 0.97415507 0.97415507 0.98      ]

mean value: 0.9763164538320758

MCC on Blind test: 0.28

Accuracy on Blind test: 0.59

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.26908207 0.36156917 0.38551164 0.42597938 0.36128283 0.35268307
 0.35298371 0.35420609 0.44186163 0.36942291]

mean value: 0.36745824813842776

key: score_time
value: [0.02261543 0.01918912 0.02010345 0.01927495 0.01930094 0.02481413
 0.01924181 0.01960397 0.02101731 0.01992035]

mean value: 0.02050814628601074

key: test_mcc
value: [0.88989899 0.92724828 0.98181818 0.98181818 0.96396098 1.
 0.8582788  0.98181211 0.96396098 0.91202529]

mean value: 0.9460821804669207

key: train_mcc
value: [0.96766902 0.96359022 0.9737407  0.9738378  0.95942381 0.95961736
 0.9778203  0.95742826 0.957524   0.96758197]

mean value: 0.9658233448957639

key: test_accuracy
value: [0.94495413 0.96330275 0.99082569 0.99082569 0.98165138 1.
 0.9266055  0.99082569 0.98165138 0.95412844]

mean value: 0.9724770642201835

key: train_accuracy
value: [0.98369011 0.98165138 0.98674822 0.98674822 0.97961264 0.97961264
 0.98878695 0.97859327 0.97859327 0.98369011]

mean value: 0.9827726809378186

key: test_fscore
value: [0.94444444 0.96363636 0.99082569 0.99082569 0.98181818 1.
 0.93103448 0.99099099 0.98148148 0.95652174]

mean value: 0.9731579060407307

key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:135: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:138: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.98390342 0.98189135 0.98690836 0.98693467 0.97983871 0.97987928
 0.9889001  0.97880928 0.97885196 0.98383838]

mean value: 0.9829755517864163

key: test_precision
value: [0.94444444 0.94642857 0.98181818 0.98181818 0.96428571 1.
 0.8852459  0.98214286 1.         0.91666667]

mean value: 0.9602850519243962

key: train_precision
value: [0.972167   0.97017893 0.97609562 0.97420635 0.97005988 0.96626984
 0.97804391 0.96806387 0.96620278 0.974     ]

mean value: 0.9715288180430208

key: test_recall
value: [0.94444444 0.98148148 1.         1.         1.         1.
 0.98181818 1.         0.96363636 1.        ]

mean value: 0.9871380471380471

key: train_recall
value: [0.99592668 0.99389002 0.99796334 1.         0.9898167  0.99387755
 1.         0.98979592 0.99183673 0.99387755]

mean value: 0.9946984496446236

key: test_roc_auc
value: [0.94494949 0.96346801 0.99090909 0.99090909 0.98181818 1.
 0.92609428 0.99074074 0.98181818 0.9537037 ]

mean value: 0.9724410774410774

key: train_roc_auc
value: [0.98367763 0.98163889 0.98673677 0.98673469 0.97960223 0.97962717
 0.98879837 0.97860468 0.97860676 0.98370049]

mean value: 0.9827727669479197

key: test_jcc
value: [0.89473684 0.92982456 0.98181818 0.98181818 0.96428571 1.
 0.87096774 0.98214286 0.96363636 0.91666667]

mean value: 0.9485897110812221

key: train_jcc
value: [0.96831683 0.96442688 0.97415507 0.97420635 0.96047431 0.96055227
 0.97804391 0.95849802 0.95857988 0.96819085]

mean value: 0.9665444376905993

MCC on Blind test: 0.32

Accuracy on Blind test: 0.61

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.04058647 0.04435062 0.04309654 0.04436064 0.04376888 0.0449791
 0.04290581 0.04352117 0.04341054 0.04521823]

mean value: 0.04361979961395264

key: score_time
value: [0.01262784 0.01453471 0.01264024 0.01438498 0.01472068 0.01512718
 0.01484346 0.01487494 0.0149014  0.01476884]

mean value: 0.014342427253723145

key: test_mcc
value: [0.946411   0.96396098 0.98181818 0.96396098 0.96396098 0.94635821
 0.91202529 0.98181211 0.96393713 0.92905936]

mean value: 0.9553304233697641

key: train_mcc
value: [0.96987162 0.96987162 0.96592058 0.96592058 0.96987162 0.97185442
 0.97383919 0.97383919 0.96395333 0.97582781]

mean value: 0.9700769986001025

key: test_accuracy
value: [0.97247706 0.98165138 0.99082569 0.98165138 0.98165138 0.97247706
 0.95412844 0.99082569 0.98165138 0.96330275]

mean value: 0.9770642201834863

key: train_accuracy
value: [0.98470948 0.98470948 0.98267074 0.98267074 0.98470948 0.98572885
 0.98674822 0.98674822 0.98165138 0.98776758]

mean value: 0.9848114169215086

key: test_fscore
value: [0.97297297 0.98181818 0.99082569 0.98181818 0.98181818 0.97345133
 0.95652174 0.99099099 0.98214286 0.96491228]

mean value: 0.9777272401900579

key: train_fscore
value: [0.98495486 0.98495486 0.98298298 0.98298298 0.98495486 0.98591549
 0.98690836 0.98690836 0.98196393 0.98790323]

mean value: 0.9850429923386353

key: test_precision
value: [0.94736842 0.96428571 0.98181818 0.96428571 0.96428571 0.94827586
 0.91666667 0.98214286 0.96491228 0.93220339]

mean value: 0.9566244802138708

key: train_precision
value: [0.97035573 0.97035573 0.96653543 0.96653543 0.97035573 0.97222222
 0.97415507 0.97415507 0.96456693 0.97609562]

mean value: 0.9705332967868592

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97272727 0.98181818 0.99090909 0.98181818 0.98181818 0.97222222
 0.9537037  0.99074074 0.98148148 0.96296296]

mean value: 0.977020202020202

key: train_roc_auc
value: [0.98469388 0.98469388 0.98265306 0.98265306 0.98469388 0.98574338
 0.98676171 0.98676171 0.98167006 0.98778004]

mean value: 0.9848104659379027

key: test_jcc
value: [0.94736842 0.96428571 0.98181818 0.96428571 0.96428571 0.94827586
 0.91666667 0.98214286 0.96491228 0.93220339]

mean value: 0.9566244802138708

key: train_jcc
value: [0.97035573 0.97035573 0.96653543 0.96653543 0.97035573 0.97222222
 0.97415507 0.97415507 0.96456693 0.97609562]

mean value: 0.9705332967868592

MCC on Blind test: 0.33

Accuracy on Blind test: 0.61

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [1.17332339 1.07144642 1.20221496 1.05029082 1.10251975 1.00360799
 1.09597087 1.00588417 1.12927198 1.03671074]

mean value: 1.0871241092681885

key: score_time
value: [0.01915336 0.0194943  0.01916051 0.03661346 0.01905918 0.0230329
 0.02343559 0.01981854 0.02063799 0.01904988]

mean value: 0.02194557189941406

key: test_mcc
value: [0.96396098 0.96396098 0.98181818 0.98181818 0.9291517  0.94635821
 0.92905936 0.98181211 0.98181211 0.92905936]

mean value: 0.9588811186918972

key: train_mcc
value: [1.         0.99796333 1.         1.         1.         0.9778203
 0.99796334 1.         1.         0.99796334]

mean value: 0.9971710313107653

key: test_accuracy
value: [0.98165138 0.98165138 0.99082569 0.99082569 0.96330275 0.97247706
 0.96330275 0.99082569 0.99082569 0.96330275]

mean value: 0.9788990825688073

key: train_accuracy
value: [1.         0.99898063 1.         1.         1.         0.98878695
 0.99898063 1.         1.         0.99898063]

mean value: 0.9985728848114169

key: test_fscore
value: [0.98181818 0.98181818 0.99082569 0.99082569 0.96428571 0.97345133
 0.96491228 0.99099099 0.99099099 0.96491228]

mean value: 0.9794831324887986

key: train_fscore
value: [1.         0.99898271 1.         1.         1.         0.9889001
 0.99898063 1.         1.         0.99898063]

mean value: 0.9985844070926517

key: test_precision
value: [0.96428571 0.96428571 0.98181818 0.98181818 0.93103448 0.94827586
 0.93220339 0.98214286 0.98214286 0.93220339]

mean value: 0.9600210630982109

key: train_precision
value: [1.         0.99796748 1.         1.         1.         0.97804391
 0.99796334 1.         1.         0.99796334]

mean value: 0.9971938072094845

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98181818 0.98181818 0.99090909 0.99090909 0.96363636 0.97222222
 0.96296296 0.99074074 0.99074074 0.96296296]

mean value: 0.9788720538720539

key: train_roc_auc
value: [1.         0.99897959 1.         1.         1.         0.98879837
 0.99898167 1.         1.         0.99898167]

mean value: 0.9985741302631032

key: test_jcc
value: [0.96428571 0.96428571 0.98181818 0.98181818 0.93103448 0.94827586
 0.93220339 0.98214286 0.98214286 0.93220339]

mean value: 0.9600210630982109

key: train_jcc
value: [1.         0.99796748 1.         1.         1.         0.97804391
 0.99796334 1.         1.         0.99796334]

mean value: 0.9971938072094845

MCC on Blind test: 0.21

Accuracy on Blind test: 0.56

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.03226662 0.01357055 0.01272416 0.01376796 0.01404357 0.01343679
 0.01265812 0.01321292 0.01250315 0.01331568]

mean value: 0.015149950981140137

key: score_time
value: [0.01026297 0.01015615 0.01043391 0.00975466 0.01017189 0.00939393
 0.009799   0.00966072 0.0104816  0.00953364]

mean value: 0.009964847564697265

key: test_mcc
value: [0.76807644 0.8585559  0.77238262 0.75632883 0.75156361 0.69717653
 0.82131618 0.846254   0.87280881 0.75090861]

mean value: 0.7895371517463353

key: train_mcc
value: [0.79150846 0.78327281 0.79289836 0.79289836 0.79885223 0.79385491
 0.80032507 0.79025852 0.78014362 0.79156576]

mean value: 0.7915578108083023

key: test_accuracy
value: [0.88073394 0.9266055  0.88073394 0.87155963 0.87155963 0.8440367
 0.90825688 0.91743119 0.93577982 0.87155963]

mean value: 0.8908256880733945

key: train_accuracy
value: [0.89194699 0.88786952 0.89296636 0.89296636 0.89602446 0.89296636
 0.89704383 0.89194699 0.88583078 0.89194699]

mean value: 0.8921508664627931

key: test_fscore
value: [0.88695652 0.92982456 0.88888889 0.88135593 0.87931034 0.85714286
 0.9137931  0.92436975 0.9380531  0.88135593]

mean value: 0.8981050987101319

key: train_fscore
value: [0.89904762 0.8952381  0.89971347 0.89971347 0.90248566 0.89990467
 0.90297791 0.89827255 0.89353612 0.89885496]

mean value: 0.898974452130224

key: test_precision
value: [0.83606557 0.88333333 0.82539683 0.8125     0.82258065 0.796875
 0.86885246 0.859375   0.9137931  0.82539683]

mean value: 0.8444168765523435

key: train_precision
value: [0.84436494 0.84078712 0.8471223  0.8471223  0.85045045 0.84436494
 0.85299456 0.84782609 0.83629893 0.84408602]

mean value: 0.8455417645600413

key: test_recall
value: [0.94444444 0.98148148 0.96296296 0.96296296 0.94444444 0.92727273
 0.96363636 1.         0.96363636 0.94545455]

mean value: 0.9596296296296296

key: train_recall
value: [0.96130346 0.95723014 0.9592668  0.9592668  0.96130346 0.96326531
 0.95918367 0.95510204 0.95918367 0.96122449]

mean value: 0.959632985577123

key: test_roc_auc
value: [0.88131313 0.92710438 0.88148148 0.87239057 0.87222222 0.84326599
 0.90774411 0.91666667 0.93552189 0.87087542]

mean value: 0.8908585858585859

key: train_roc_auc
value: [0.89187622 0.88779874 0.89289871 0.89289871 0.89595785 0.89303795
 0.89710711 0.89201131 0.88590548 0.89201754]

mean value: 0.892150962217881

key: test_jcc
value: [0.796875   0.86885246 0.8        0.78787879 0.78461538 0.75
 0.84126984 0.859375   0.88333333 0.78787879]

mean value: 0.8160078593992528

key: train_jcc
value: [0.816609   0.81034483 0.81770833 0.81770833 0.82229965 0.81802426
 0.82311734 0.81533101 0.80756014 0.81629116]

mean value: 0.8164994052884171

MCC on Blind test: 0.46

Accuracy on Blind test: 0.71

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01432371 0.01744103 0.01744962 0.01802087 0.01732731 0.01733041
 0.0174067  0.01744223 0.01742673 0.01739192]

mean value: 0.017156052589416503

key: score_time
value: [0.01219678 0.01250386 0.01259708 0.01271725 0.01278138 0.01281834
 0.01276517 0.01271963 0.01281548 0.01277828]

mean value: 0.01266932487487793

key: test_mcc
value: [0.63446308 0.78024981 0.72570999 0.72570999 0.65151515 0.81882759
 0.7280447  0.72598622 0.74351235 0.76486923]

mean value: 0.7298888092654613

key: train_mcc
value: [0.73813656 0.72410437 0.73869429 0.73014127 0.75785401 0.71345039
 0.73419371 0.73010884 0.75399333 0.72585556]

mean value: 0.7346532327609935

key: test_accuracy
value: [0.81651376 0.88990826 0.86238532 0.86238532 0.82568807 0.90825688
 0.86238532 0.86238532 0.87155963 0.88073394]

mean value: 0.8642201834862385

key: train_accuracy
value: [0.86850153 0.86136595 0.86850153 0.86442406 0.87869521 0.85626911
 0.86646279 0.86442406 0.87665647 0.86238532]

mean value: 0.8667686034658512

key: test_fscore
value: [0.80769231 0.88679245 0.85714286 0.85714286 0.82568807 0.90566038
 0.85714286 0.85981308 0.875      0.87619048]

mean value: 0.8608265343006679

key: train_fscore
value: [0.86492147 0.85714286 0.86406744 0.86044071 0.87668394 0.85235602
 0.86225026 0.86014721 0.8738269  0.85834208]

mean value: 0.8630178891837997

key: test_precision
value: [0.84       0.90384615 0.88235294 0.88235294 0.81818182 0.94117647
 0.9        0.88461538 0.85964912 0.92      ]

mean value: 0.883217483239155

key: train_precision
value: [0.89008621 0.88503254 0.89519651 0.88744589 0.89240506 0.87526882
 0.88937093 0.88720174 0.89339019 0.88336933]

mean value: 0.8878767209813069

key: test_recall
value: [0.77777778 0.87037037 0.83333333 0.83333333 0.83333333 0.87272727
 0.81818182 0.83636364 0.89090909 0.83636364]

mean value: 0.8402693602693603

key: train_recall
value: [0.84114053 0.83095723 0.83503055 0.83503055 0.86150713 0.83061224
 0.83673469 0.83469388 0.85510204 0.83469388]

mean value: 0.8395502722473919

key: test_roc_auc
value: [0.81616162 0.88973064 0.86212121 0.86212121 0.82575758 0.90858586
 0.86279461 0.86262626 0.87138047 0.88114478]

mean value: 0.8642424242424243

key: train_roc_auc
value: [0.86852945 0.86139698 0.86853568 0.86445405 0.87871275 0.85624299
 0.86643252 0.86439378 0.87663452 0.86235712]

mean value: 0.8667689845795752

key: test_jcc
value: [0.67741935 0.79661017 0.75       0.75       0.703125   0.82758621
 0.75       0.75409836 0.77777778 0.77966102]

mean value: 0.7566277886609455

key: train_jcc
value: [0.76199262 0.75       0.7606679  0.75506446 0.7804428  0.74270073
 0.75785582 0.75461255 0.77592593 0.75183824]

mean value: 0.7591101044424549

MCC on Blind test: 0.36

Accuracy on Blind test: 0.66

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.01675367 0.01172948 0.01112866 0.01174927 0.01270866 0.01112056
 0.01259708 0.01317763 0.01103449 0.01173782]

mean value: 0.012373733520507812

key: score_time
value: [0.03388858 0.01459813 0.0150125  0.01497555 0.0157063  0.01583314
 0.01537919 0.0203886  0.01956248 0.01527023]

mean value: 0.0180614709854126

key: test_mcc
value: [0.91216737 0.946411   0.946411   0.98181818 1.         0.94635821
 0.91202529 0.91202529 1.         0.96393713]

mean value: 0.9521153465030314

key: train_mcc
value: [0.96789422 0.96592058 0.96592058 0.96395068 0.96198449 0.96198744
 0.96395333 0.97185442 0.96002526 0.96002526]

mean value: 0.9643516270143173

key: test_accuracy
value: [0.95412844 0.97247706 0.97247706 0.99082569 1.         0.97247706
 0.95412844 0.95412844 1.         0.98165138]

mean value: 0.9752293577981652

key: train_accuracy
value: [0.98369011 0.98267074 0.98267074 0.98165138 0.98063201 0.98063201
 0.98165138 0.98572885 0.97961264 0.97961264]

mean value: 0.981855249745158

key: test_fscore
value: [0.95575221 0.97297297 0.97297297 0.99082569 1.         0.97345133
 0.95652174 0.95652174 1.         0.98214286]

mean value: 0.9761161509246076

key: train_fscore
value: [0.98396794 0.98298298 0.98298298 0.982      0.98101898 0.98098098
 0.98196393 0.98591549 0.98       0.98      ]

mean value: 0.9821813284651129

key: test_precision
value: [0.91525424 0.94736842 0.94736842 0.98181818 1.         0.94827586
 0.91666667 0.91666667 1.         0.96491228]

mean value: 0.9538330737315633

key: train_precision
value: [0.96844181 0.96653543 0.96653543 0.96463654 0.9627451  0.96267191
 0.96456693 0.97222222 0.96078431 0.96078431]

mean value: 0.9649924005520801

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.97272727 0.97272727 0.99090909 1.         0.97222222
 0.9537037  0.9537037  1.         0.98148148]

mean value: 0.9752020202020202

key: train_roc_auc
value: [0.98367347 0.98265306 0.98265306 0.98163265 0.98061224 0.98065173
 0.98167006 0.98574338 0.9796334  0.9796334 ]

mean value: 0.9818556465356

key: test_jcc
value: [0.91525424 0.94736842 0.94736842 0.98181818 1.         0.94827586
 0.91666667 0.91666667 1.         0.96491228]

mean value: 0.9538330737315633

key: train_jcc
value: [0.96844181 0.96653543 0.96653543 0.96463654 0.9627451  0.96267191
 0.96456693 0.97222222 0.96078431 0.96078431]

mean value: 0.9649924005520801

MCC on Blind test: 0.19

Accuracy on Blind test: 0.56

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.0302546  0.02813029 0.02809978 0.0278976  0.02761269 0.02798128
 0.03074837 0.02966928 0.03014135 0.02853847]

mean value: 0.028907370567321778

key: score_time
value: [0.01462579 0.01377559 0.01365614 0.01365614 0.01376009 0.0138247
 0.01475239 0.01382637 0.01384187 0.01394892]

mean value: 0.013966798782348633

key: test_mcc
value: [0.96396098 0.98181818 1.         1.         0.96396098 1.
 0.94635821 0.98181211 1.         0.94635821]

mean value: 0.9784268692563587

key: train_mcc
value: [0.98382069 0.98382069 0.98181631 0.98181631 0.98382069 0.98181698
 0.98582943 0.98382123 0.97981668 0.98582943]

mean value: 0.983220845209887

key: test_accuracy
value: [0.98165138 0.99082569 1.         1.         0.98165138 1.
 0.97247706 0.99082569 1.         0.97247706]

mean value: 0.9889908256880734

key: train_accuracy
value: [0.99184506 0.99184506 0.99082569 0.99082569 0.99184506 0.99082569
 0.99286442 0.99184506 0.98980632 0.99286442]

mean value: 0.991539245667686

key: test_fscore
value: [0.98181818 0.99082569 1.         1.         0.98181818 1.
 0.97345133 0.99099099 1.         0.97345133]

mean value: 0.9892355697568005

key: train_fscore
value: [0.99191919 0.99191919 0.99091826 0.99091826 0.99191919 0.9908999
 0.9929078  0.99190283 0.98989899 0.9929078 ]

mean value: 0.9916111430148137

key: test_precision
value: [0.96428571 0.98181818 1.         1.         0.96428571 1.
 0.94827586 0.98214286 1.         0.94827586]

mean value: 0.9789084191670399

key: train_precision
value: [0.98396794 0.98396794 0.982      0.982      0.98396794 0.98196393
 0.98591549 0.98393574 0.98       0.98591549]

mean value: 0.9833634464358323

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98181818 0.99090909 1.         1.         0.98181818 1.
 0.97222222 0.99074074 1.         0.97222222]

mean value: 0.988973063973064

key: train_roc_auc
value: [0.99183673 0.99183673 0.99081633 0.99081633 0.99183673 0.99083503
 0.99287169 0.99185336 0.9898167  0.99287169]

mean value: 0.9915391329647949

key: test_jcc
value: [0.96428571 0.98181818 1.         1.         0.96428571 1.
 0.94827586 0.98214286 1.         0.94827586]

mean value: 0.9789084191670399

key: train_jcc
value: [0.98396794 0.98396794 0.982      0.982      0.98396794 0.98196393
 0.98591549 0.98393574 0.98       0.98591549]

mean value: 0.9833634464358323

MCC on Blind test: 0.28

Accuracy on Blind test: 0.59

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.71428227 2.08614016 2.62061143 2.41919136 2.2262578  2.1079421
 2.22521067 2.41262627 2.21698761 2.23710108]

mean value: 2.2266350746154786

key: score_time
value: [0.01254702 0.01257324 0.01411819 0.01264691 0.01291251 0.01294541
 0.01289654 0.01295042 0.01260424 0.01268959]

mean value: 0.012888407707214356

key: test_mcc
value: [0.96396098 0.96396098 0.98181818 1.         0.96396098 0.98181211
 0.92905936 0.98181211 1.         0.92905936]

mean value: 0.9695444073814772

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98165138 0.98165138 0.99082569 1.         0.98165138 0.99082569
 0.96330275 0.99082569 1.         0.96330275]

mean value: 0.9844036697247707

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.98181818 0.99082569 1.         0.98181818 0.99099099
 0.96491228 0.99099099 1.         0.96491228]

mean value: 0.9848086776913431

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.96428571 0.98181818 1.         0.96428571 0.98214286
 0.93220339 0.98214286 1.         0.93220339]

mean value: 0.9703367818622056

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98181818 0.98181818 0.99090909 1.         0.98181818 0.99074074
 0.96296296 0.99074074 1.         0.96296296]

mean value: 0.9843771043771044

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.96428571 0.98181818 1.         0.96428571 0.98214286
 0.93220339 0.98214286 1.         0.93220339]

mean value: 0.9703367818622056

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.29

Accuracy on Blind test: 0.59

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.04392576 0.0322082  0.03475714 0.03228712 0.0325017  0.033427
 0.03042507 0.03155088 0.03268433 0.03452039]

mean value: 0.03382875919342041

key: score_time
value: [0.0112896  0.00917888 0.00912571 0.00906205 0.0092423  0.00923324
 0.00922656 0.00922465 0.00923491 0.00925946]

mean value: 0.009407734870910645

key: test_mcc
value: [0.9291517  0.946411   0.98181818 1.         0.98181818 1.
 0.94635821 0.98181211 0.98181211 0.94635821]

mean value: 0.969553972019105

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96330275 0.97247706 0.99082569 1.         0.99082569 1.
 0.97247706 0.99082569 0.99082569 0.97247706]

mean value: 0.9844036697247707

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.97297297 0.99082569 1.         0.99082569 1.
 0.97345133 0.99099099 0.99099099 0.97345133]

mean value: 0.9847794700254715

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.93103448 0.94736842 0.98181818 1.         0.98181818 1.
 0.94827586 0.98214286 0.98214286 0.94827586]

mean value: 0.9702876705871261

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96363636 0.97272727 0.99090909 1.         0.99090909 1.
 0.97222222 0.99074074 0.99074074 0.97222222]

mean value: 0.9844107744107744

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.94736842 0.98181818 1.         0.98181818 1.
 0.94827586 0.98214286 0.98214286 0.94827586]

mean value: 0.9702876705871261

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.18

Accuracy on Blind test: 0.55

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.1274991  0.12812495 0.12774634 0.12863183 0.12888646 0.13110614
 0.12780571 0.12781787 0.12792253 0.12815571]

mean value: 0.12836966514587403

key: score_time
value: [0.01829576 0.01841164 0.01846838 0.0185349  0.01836324 0.01849031
 0.01845002 0.0185163  0.01841092 0.0183692 ]

mean value: 0.01843106746673584

key: test_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.16

Accuracy on Blind test: 0.54

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.01226139 0.01228285 0.01226473 0.01214695 0.01218414 0.01226926
 0.01206851 0.01219273 0.01233721 0.01204967]

mean value: 0.012205743789672851

key: score_time
value: [0.00921822 0.00932813 0.00931048 0.00925422 0.00949788 0.00925541
 0.0092206  0.00980568 0.00936985 0.0093658 ]

mean value: 0.009362626075744628

key: test_mcc
value: [0.946411   0.98181818 0.98181818 0.98181818 0.946411   0.96393713
 0.98181211 0.96393713 0.96393713 0.96393713]

mean value: 0.9675837174332553

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.97247706 0.99082569 0.99082569 0.99082569 0.97247706 0.98165138
 0.99082569 0.98165138 0.98165138 0.98165138]

mean value: 0.9834862385321101

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.97297297 0.99082569 0.99082569 0.99082569 0.97297297 0.98214286
 0.99099099 0.98214286 0.98214286 0.98214286]

mean value: 0.9837985429728549

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.94736842 0.98181818 0.98181818 0.98181818 0.94736842 0.96491228
 0.98214286 0.96491228 0.96491228 0.96491228]

mean value: 0.9681983367509683

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97272727 0.99090909 0.99090909 0.99090909 0.97272727 0.98148148
 0.99074074 0.98148148 0.98148148 0.98148148]

mean value: 0.9834848484848485

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.94736842 0.98181818 0.98181818 0.98181818 0.94736842 0.96491228
 0.98214286 0.96491228 0.96491228 0.96491228]

mean value: 0.9681983367509683

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.57

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.84613895 1.84882665 1.8708818  1.86475921 1.84473586 1.86491394
 1.85439205 1.88788891 1.88814402 1.87820005]

mean value: 1.8648881435394287

key: score_time
value: [0.10223675 0.09952426 0.09806609 0.09618688 0.09979129 0.1016283
 0.09665799 0.10203648 0.09662509 0.10175681]

mean value: 0.09945099353790283

key: test_mcc
value: [0.96396098 0.98181818 1.         1.         1.         1.
 0.98181211 0.98181211 1.         0.98181211]

mean value: 0.9891215506967859

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98165138 0.99082569 1.         1.         1.         1.
 0.99082569 0.99082569 1.         0.99082569]

mean value: 0.9944954128440368

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.99082569 1.         1.         1.         1.
 0.99099099 0.99099099 1.         0.99099099]

mean value: 0.9945616842864549

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.98181818 1.         1.         1.         1.
 0.98214286 0.98214286 1.         0.98214286]

mean value: 0.9892532467532468

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98181818 0.99090909 1.         1.         1.         1.
 0.99074074 0.99074074 1.         0.99074074]

mean value: 0.9944949494949495

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.98181818 1.         1.         1.         1.
 0.98214286 0.98214286 1.         0.98214286]

mean value: 0.9892532467532468

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.21

Accuracy on Blind test: 0.55

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...05', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [1.09517479 1.02668071 1.03418374 1.04638362 1.05752635 1.01023602
 1.05089188 1.00196314 1.07685375 1.11304617]

mean value: 1.0512940168380738

key: score_time
value: [0.28117323 0.28493404 0.23831916 0.273947   0.27481461 0.26104164
 0.21751809 0.22988033 0.25270414 0.24373722]

mean value: 0.2558069467544556

key: test_mcc
value: [0.946411   0.96396098 1.         1.         0.96396098 1.
 0.94635821 0.96393713 0.98181211 0.96393713]

mean value: 0.9730377554095189

key: train_mcc
value: [0.99187796 0.98784133 0.98784133 0.98985763 0.98784133 0.98582943
 0.98582943 0.98784163 0.98985784 0.98985784]

mean value: 0.9884475773437704

key: test_accuracy
value: [0.97247706 0.98165138 1.         1.         0.98165138 1.
 0.97247706 0.98165138 0.99082569 0.98165138]

mean value: 0.9862385321100917

key: train_accuracy
value: [0.99592253 0.99388379 0.99388379 0.99490316 0.99388379 0.99286442
 0.99286442 0.99388379 0.99490316 0.99490316]

mean value: 0.9941896024464831

key: test_fscore
value: [0.97297297 0.98181818 1.         1.         0.98181818 1.
 0.97345133 0.98214286 0.99099099 0.98214286]

mean value: 0.986533736931967

key: train_fscore
value: [0.9959432  0.99392713 0.99392713 0.99493414 0.99392713 0.9929078
 0.9929078  0.99391481 0.99492386 0.99492386]

mean value: 0.9942236851131838

key: test_precision
value: [0.94736842 0.96428571 1.         1.         0.96428571 1.
 0.94827586 0.96491228 0.98214286 0.96491228]

mean value: 0.9736183130239392

key: train_precision
value: [0.99191919 0.98792757 0.98792757 0.98991935 0.98792757 0.98591549
 0.98591549 0.98790323 0.98989899 0.98989899]

mean value: 0.9885153434454889

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97272727 0.98181818 1.         1.         0.98181818 1.
 0.97222222 0.98148148 0.99074074 0.98148148]

mean value: 0.9862289562289562

key: train_roc_auc
value: [0.99591837 0.99387755 0.99387755 0.99489796 0.99387755 0.99287169
 0.99287169 0.99389002 0.99490835 0.99490835]

mean value: 0.994189908142483

key: test_jcc
value: [0.94736842 0.96428571 1.         1.         0.96428571 1.
 0.94827586 0.96491228 0.98214286 0.96491228]

mean value: 0.9736183130239392

key: train_jcc
value: [0.99191919 0.98792757 0.98792757 0.98991935 0.98792757 0.98591549
 0.98591549 0.98790323 0.98989899 0.98989899]

mean value: 0.9885153434454889

MCC on Blind test: 0.29

Accuracy on Blind test: 0.59

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02011395 0.01671338 0.01653075 0.02227449 0.01953197 0.01918554
 0.01853752 0.01888394 0.01923013 0.01840615]

mean value: 0.01894078254699707

key: score_time
value: [0.01299596 0.01254225 0.01255369 0.01404405 0.01341295 0.01391649
 0.01417804 0.01381135 0.01417184 0.0138309 ]

mean value: 0.013545751571655273

key: test_mcc
value: [0.63446308 0.78024981 0.72570999 0.72570999 0.65151515 0.81882759
 0.7280447  0.72598622 0.74351235 0.76486923]

mean value: 0.7298888092654613

key: train_mcc
value: [0.73813656 0.72410437 0.73869429 0.73014127 0.75785401 0.71345039
 0.73419371 0.73010884 0.75399333 0.72585556]

mean value: 0.7346532327609935

key: test_accuracy
value: [0.81651376 0.88990826 0.86238532 0.86238532 0.82568807 0.90825688
 0.86238532 0.86238532 0.87155963 0.88073394]

mean value: 0.8642201834862385

key: train_accuracy
value: [0.86850153 0.86136595 0.86850153 0.86442406 0.87869521 0.85626911
 0.86646279 0.86442406 0.87665647 0.86238532]

mean value: 0.8667686034658512

key: test_fscore
value: [0.80769231 0.88679245 0.85714286 0.85714286 0.82568807 0.90566038
 0.85714286 0.85981308 0.875      0.87619048]

mean value: 0.8608265343006679

key: train_fscore
value: [0.86492147 0.85714286 0.86406744 0.86044071 0.87668394 0.85235602
 0.86225026 0.86014721 0.8738269  0.85834208]

mean value: 0.8630178891837997

key: test_precision
value: [0.84       0.90384615 0.88235294 0.88235294 0.81818182 0.94117647
 0.9        0.88461538 0.85964912 0.92      ]

mean value: 0.883217483239155

key: train_precision
value: [0.89008621 0.88503254 0.89519651 0.88744589 0.89240506 0.87526882
 0.88937093 0.88720174 0.89339019 0.88336933]

mean value: 0.8878767209813069

key: test_recall
value: [0.77777778 0.87037037 0.83333333 0.83333333 0.83333333 0.87272727
 0.81818182 0.83636364 0.89090909 0.83636364]

mean value: 0.8402693602693603

key: train_recall
value: [0.84114053 0.83095723 0.83503055 0.83503055 0.86150713 0.83061224
 0.83673469 0.83469388 0.85510204 0.83469388]

mean value: 0.8395502722473919

key: test_roc_auc
value: [0.81616162 0.88973064 0.86212121 0.86212121 0.82575758 0.90858586
 0.86279461 0.86262626 0.87138047 0.88114478]

mean value: 0.8642424242424243

key: train_roc_auc
value: [0.86852945 0.86139698 0.86853568 0.86445405 0.87871275 0.85624299
 0.86643252 0.86439378 0.87663452 0.86235712]

mean value: 0.8667689845795752

key: test_jcc
value: [0.67741935 0.79661017 0.75       0.75       0.703125   0.82758621
 0.75       0.75409836 0.77777778 0.77966102]

mean value: 0.7566277886609455

key: train_jcc
value: [0.76199262 0.75       0.7606679  0.75506446 0.7804428  0.74270073
 0.75785582 0.75461255 0.77592593 0.75183824]

mean value: 0.7591101044424549

MCC on Blind test: 0.36

Accuracy on Blind test: 0.66

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC0...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.11796188 0.10961556 0.10818744 0.10456371 0.1058588  0.1078229
 0.25975823 0.10478783 0.10844398 0.13115287]

mean value: 0.1258153200149536

key: score_time
value: [0.01118398 0.011338   0.01129174 0.01120877 0.01139569 0.01124907
 0.01117444 0.0112884  0.01124215 0.01141405]

mean value: 0.011278629302978516

key: test_mcc
value: [0.91216737 1.         1.         1.         0.96396098 0.98181211
 0.96393713 0.98181211 0.98181211 0.96393713]

mean value: 0.9749438950975768

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95412844 1.         1.         1.         0.98165138 0.99082569
 0.98165138 0.99082569 0.99082569 0.98165138]

mean value: 0.9871559633027523

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95575221 1.         1.         1.         0.98181818 0.99099099
 0.98214286 0.99099099 0.99099099 0.98214286]

mean value: 0.987482908146625

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.91525424 1.         1.         1.         0.96428571 0.98214286
 0.96491228 0.98214286 0.98214286 0.96491228]

mean value: 0.975579308440593

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 1.         1.         1.         0.98181818 0.99074074
 0.98148148 0.99074074 0.99074074 0.98148148]

mean value: 0.9871548821548821

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.91525424 1.         1.         1.         0.96428571 0.98214286
 0.96491228 0.98214286 0.98214286 0.96491228]

mean value: 0.975579308440593

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.3

Accuracy on Blind test: 0.6

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.07150984 0.06511164 0.06523538 0.06628895 0.09167719 0.06410146
 0.07649589 0.09414434 0.06980395 0.06749415]

mean value: 0.07318627834320068

key: score_time
value: [0.02362418 0.01245999 0.01245952 0.01243114 0.01258183 0.01239157
 0.01956177 0.02616715 0.01251602 0.02478385]

mean value: 0.016897702217102052

key: test_mcc
value: [0.89544301 0.91216737 0.89544301 1.         0.946411   0.98181211
 0.87869377 0.89524142 0.98181211 0.89524142]

mean value: 0.9282265219647683

key: train_mcc
value: [0.97582662 0.96987162 0.9738378  0.96789422 0.96987162 0.96789632
 0.96987347 0.97185442 0.96592295 0.97383919]

mean value: 0.9706688251163492

key: test_accuracy
value: [0.94495413 0.95412844 0.94495413 1.         0.97247706 0.99082569
 0.93577982 0.94495413 0.99082569 0.94495413]

mean value: 0.9623853211009175

key: train_accuracy
value: [0.98776758 0.98470948 0.98674822 0.98369011 0.98470948 0.98369011
 0.98470948 0.98572885 0.98267074 0.98674822]

mean value: 0.9851172273190621

key: test_fscore
value: [0.94736842 0.95575221 0.94736842 1.         0.97297297 0.99099099
 0.94017094 0.94827586 0.99099099 0.94827586]

mean value: 0.964216667375847

key: train_fscore
value: [0.98792757 0.98495486 0.98693467 0.98396794 0.98495486 0.98393574
 0.98492462 0.98591549 0.98294885 0.98690836]

mean value: 0.9853372967912892

key: test_precision
value: [0.9        0.91525424 0.9        1.         0.94736842 0.98214286
 0.88709677 0.90163934 0.98214286 0.90163934]

mean value: 0.931728383534462

key: train_precision
value: [0.97614314 0.97035573 0.97420635 0.96844181 0.97035573 0.96837945
 0.97029703 0.97222222 0.96646943 0.97415507]

mean value: 0.9711025963561587

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94545455 0.95454545 0.94545455 1.         0.97272727 0.99074074
 0.93518519 0.94444444 0.99074074 0.94444444]

mean value: 0.9623737373737373

key: train_roc_auc
value: [0.9877551  0.98469388 0.98673469 0.98367347 0.98469388 0.98370672
 0.98472505 0.98574338 0.98268839 0.98676171]

mean value: 0.9851176274990648

key: test_jcc
value: [0.9        0.91525424 0.9        1.         0.94736842 0.98214286
 0.88709677 0.90163934 0.98214286 0.90163934]

mean value: 0.931728383534462

key: train_jcc
value: [0.97614314 0.97035573 0.97420635 0.96844181 0.97035573 0.96837945
 0.97029703 0.97222222 0.96646943 0.97415507]

mean value: 0.9711025963561587

MCC on Blind test: 0.28

Accuracy on Blind test: 0.61

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01576996 0.0410161  0.04309177 0.01604724 0.01587033 0.01609635
 0.0160675  0.01610398 0.01614022 0.01605773]

mean value: 0.021226119995117188

key: score_time
value: [0.01217413 0.01242757 0.01232624 0.01218367 0.01216888 0.01491857
 0.01216054 0.01213479 0.01214075 0.01212192]

mean value: 0.012475705146789551

key: test_mcc
value: [0.68811051 0.67128761 0.72482321 0.65151515 0.56345904 0.67025938
 0.74368478 0.71100747 0.67172877 0.72598622]

mean value: 0.6821862154657743

key: train_mcc
value: [0.68441792 0.69257749 0.67825101 0.68233052 0.69490253 0.68014841
 0.67406705 0.68422743 0.68630482 0.68439564]

mean value: 0.6841622811613333

key: test_accuracy
value: [0.8440367  0.83486239 0.86238532 0.82568807 0.77981651 0.83486239
 0.87155963 0.85321101 0.83486239 0.86238532]

mean value: 0.8403669724770643

key: train_accuracy
value: [0.84199796 0.84607543 0.83893986 0.84097859 0.8470948  0.83995923
 0.83690112 0.84199796 0.84301733 0.84199796]

mean value: 0.8418960244648318

key: test_fscore
value: [0.8411215  0.82692308 0.85981308 0.82568807 0.78947368 0.83333333
 0.87037037 0.84615385 0.83018868 0.85981308]

mean value: 0.8382878727182334

key: train_fscore
value: [0.83937824 0.84352332 0.83643892 0.83850932 0.84375    0.83764219
 0.83436853 0.83971044 0.84057971 0.83904465]

mean value: 0.8392945323885889

key: test_precision
value: [0.8490566  0.86       0.86792453 0.81818182 0.75       0.8490566
 0.88679245 0.89795918 0.8627451  0.88461538]

mean value: 0.8526331673189134

key: train_precision
value: [0.85443038 0.85864979 0.85052632 0.85263158 0.86353945 0.8490566
 0.84663866 0.85115304 0.85294118 0.85412262]

mean value: 0.8533689606245336

key: test_recall
value: [0.83333333 0.7962963  0.85185185 0.83333333 0.83333333 0.81818182
 0.85454545 0.8        0.8        0.83636364]

mean value: 0.8257239057239057

key: train_recall
value: [0.82484725 0.82892057 0.82281059 0.82484725 0.82484725 0.82653061
 0.82244898 0.82857143 0.82857143 0.8244898 ]

mean value: 0.8256885157321584

key: test_roc_auc
value: [0.84393939 0.83451178 0.86228956 0.82575758 0.78030303 0.83501684
 0.87171717 0.8537037  0.83518519 0.86262626]

mean value: 0.8405050505050505

key: train_roc_auc
value: [0.84201546 0.84609294 0.83895632 0.84099505 0.8471175  0.83994555
 0.8368864  0.84198429 0.84300262 0.84198013]

mean value: 0.8418976266677751

key: test_jcc
value: [0.72580645 0.70491803 0.75409836 0.703125   0.65217391 0.71428571
 0.7704918  0.73333333 0.70967742 0.75409836]

mean value: 0.7222008389007317

key: train_jcc
value: [0.72321429 0.72939068 0.71886121 0.72192513 0.72972973 0.72064057
 0.71580817 0.72370766 0.725      0.72271914]

mean value: 0.7230996586219895

MCC on Blind test: 0.45

Accuracy on Blind test: 0.71

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0369277  0.04088545 0.03215814 0.03854275 0.03352094 0.03347206
 0.0334053  0.02965021 0.04916883 0.03165388]

mean value: 0.03593852519989014

key: score_time
value: [0.01192832 0.01221871 0.01212454 0.01213908 0.01212454 0.0121572
 0.0121696  0.01220918 0.01216412 0.01215053]

mean value: 0.012138581275939942

key: test_mcc
value: [0.96396098 0.96396098 0.98181818 0.98181818 0.946411   0.98181211
 0.92905936 0.98181211 1.         0.92905936]

mean value: 0.9659712270812341

key: train_mcc
value: [0.99796333 0.99796333 0.98784133 0.98784133 0.97582662 0.98985784
 0.98985784 0.9918781  0.99593082 0.9918781 ]

mean value: 0.9906838647005597

key: test_accuracy
value: [0.98165138 0.98165138 0.99082569 0.99082569 0.97247706 0.99082569
 0.96330275 0.99082569 1.         0.96330275]

mean value: 0.9825688073394496

key: train_accuracy
value: [0.99898063 0.99898063 0.99388379 0.99388379 0.98776758 0.99490316
 0.99490316 0.99592253 0.99796126 0.99592253]

mean value: 0.9953109072375127

key: test_fscore
value: [0.98181818 0.98181818 0.99082569 0.99082569 0.97297297 0.99099099
 0.96491228 0.99099099 1.         0.96491228]

mean value: 0.9830067256141616

key: train_fscore
value: [0.99898271 0.99898271 0.99392713 0.99392713 0.98792757 0.99492386
 0.99492386 0.99593496 0.99796334 0.99593496]

mean value: 0.9953428202965996

key: test_precision
value: [0.96428571 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286
 0.93220339 0.98214286 1.         0.93220339]

mean value: 0.9668268707207155

key: train_precision
value: [0.99796748 0.99796748 0.98792757 0.98792757 0.97614314 0.98989899
 0.98989899 0.99190283 0.99593496 0.99190283]

mean value: 0.9907471838451151

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98181818 0.98181818 0.99090909 0.99090909 0.97272727 0.99074074
 0.96296296 0.99074074 1.         0.96296296]

mean value: 0.9825589225589225

key: train_roc_auc
value: [0.99897959 0.99897959 0.99387755 0.99387755 0.9877551  0.99490835
 0.99490835 0.99592668 0.99796334 0.99592668]

mean value: 0.9953102788977097

key: test_jcc
value: [0.96428571 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286
 0.93220339 0.98214286 1.         0.93220339]

mean value: 0.9668268707207155

key: train_jcc
value: [0.99796748 0.99796748 0.98792757 0.98792757 0.97614314 0.98989899
 0.98989899 0.99190283 0.99593496 0.99190283]

mean value: 0.9907471838451151

MCC on Blind test: 0.26

Accuracy on Blind test: 0.58

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.02573824 0.02769852 0.02286506 0.0265317  0.02289176 0.02876425
 0.02368808 0.02303743 0.02485371 0.02317619]

mean value: 0.024924492835998534

key: score_time
value: [0.0121398  0.01220655 0.01216602 0.01215792 0.01220179 0.01250315
 0.01214266 0.01208639 0.01215744 0.01210952]

mean value: 0.01218712329864502

key: test_mcc
value: [0.946411   0.96396098 0.98181818 0.98181818 0.946411   0.98181211
 0.91202529 0.98181211 1.         0.92905936]

mean value: 0.9625128216673053

key: train_mcc
value: [0.99593079 0.99593079 0.96789422 0.98382069 0.98784133 0.99796334
 0.97981668 0.98582943 0.99593082 0.96592295]

mean value: 0.9856881042566935

key: test_accuracy
value: [0.97247706 0.98165138 0.99082569 0.99082569 0.97247706 0.99082569
 0.95412844 0.99082569 1.         0.96330275]

mean value: 0.9807339449541285

key: train_accuracy
value: [0.99796126 0.99796126 0.98369011 0.99184506 0.99388379 0.99898063
 0.98980632 0.99286442 0.99796126 0.98267074]

mean value: 0.9927624872579001

key: test_fscore
value: [0.97297297 0.98181818 0.99082569 0.99082569 0.97297297 0.99099099
 0.95652174 0.99099099 1.         0.96491228]

mean value: 0.9812831505725088

key: train_fscore
value: [0.99796748 0.99796748 0.98396794 0.99191919 0.99392713 0.99898063
 0.98989899 0.9929078  0.99796334 0.98294885]

mean value: 0.9928448822634005

key: test_precision
value: [0.94736842 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286
 0.91666667 0.98214286 1.         0.93220339]

mean value: 0.963581469081023

key: train_precision
value: [0.9959432  0.9959432  0.96844181 0.98396794 0.98792757 0.99796334
 0.98       0.98591549 0.99593496 0.96646943]

mean value: 0.9858506946033496

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97272727 0.98181818 0.99090909 0.99090909 0.97272727 0.99074074
 0.9537037  0.99074074 1.         0.96296296]

mean value: 0.9807239057239057

key: train_roc_auc
value: [0.99795918 0.99795918 0.98367347 0.99183673 0.99387755 0.99898167
 0.9898167  0.99287169 0.99796334 0.98268839]

mean value: 0.9927627914709672

key: test_jcc
value: [0.94736842 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286
 0.91666667 0.98214286 1.         0.93220339]

mean value: 0.963581469081023

key: train_jcc
value: [0.9959432  0.9959432  0.96844181 0.98396794 0.98792757 0.99796334
 0.98       0.98591549 0.99593496 0.96646943]

mean value: 0.9858506946033496

MCC on Blind test: 0.29

Accuracy on Blind test: 0.6

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.28126764 0.26553988 0.26455736 0.26455355 0.27714682 0.26926732
 0.26987267 0.26936221 0.27386642 0.26629829]

mean value: 0.2701732158660889

key: score_time
value: [0.01591969 0.01593041 0.01601982 0.01616859 0.01612663 0.01737332
 0.01592994 0.01732421 0.01614046 0.01601481]

mean value: 0.01629478931427002

key: test_mcc
value: [0.9291517  0.96396098 1.         1.         0.96396098 1.
 0.96393713 0.98181211 1.         0.98181211]

mean value: 0.9784635026184957

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96330275 0.98165138 1.         1.         0.98165138 1.
 0.98165138 0.99082569 1.         0.99082569]

mean value: 0.9889908256880734

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.98181818 1.         1.         0.98181818 1.
 0.98214286 0.99099099 1.         0.99099099]

mean value: 0.9892046917046917

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.93103448 0.96428571 1.         1.         0.96428571 1.
 0.96491228 0.98214286 1.         0.98214286]

mean value: 0.9788803906317518

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96363636 0.98181818 1.         1.         0.98181818 1.
 0.98148148 0.99074074 1.         0.99074074]

mean value: 0.989023569023569

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.96428571 1.         1.         0.96428571 1.
 0.96491228 0.98214286 1.         0.98214286]

mean value: 0.9788803906317518

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.27

Accuracy on Blind test: 0.59

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.17862988 0.20598102 0.20765114 0.20498371 0.1792326  0.2004025
 0.20543003 0.20129752 0.20281935 0.20100617]

mean value: 0.1987433910369873

key: score_time
value: [0.02799296 0.04068851 0.04115677 0.04074216 0.04036856 0.03043032
 0.03932166 0.04003978 0.03971243 0.0401566 ]

mean value: 0.038060975074768064

key: test_mcc
value: [0.91216737 0.96396098 1.         1.         0.98181818 1.
 0.94635821 0.98181211 0.96393713 0.94635821]

mean value: 0.9696412205023645

key: train_mcc
value: [1.         0.99796333 0.99593079 0.99593079 0.99390234 0.99593082
 0.99390241 0.99593082 0.99796334 0.9918781 ]

mean value: 0.9959332732076022

key: test_accuracy
value: [0.95412844 0.98165138 1.         1.         0.99082569 1.
 0.97247706 0.99082569 0.98165138 0.97247706]

mean value: 0.9844036697247707

key: train_accuracy
value: [1.         0.99898063 0.99796126 0.99796126 0.9969419  0.99796126
 0.9969419  0.99796126 0.99898063 0.99592253]

mean value: 0.9979612640163099

key: test_fscore
value: [0.95575221 0.98181818 1.         1.         0.99082569 1.
 0.97345133 0.99099099 0.98214286 0.97345133]

mean value: 0.9848432585282061

key: train_fscore
value: [1.         0.99898271 0.99796748 0.99796748 0.99695431 0.99796334
 0.99694812 0.99796334 0.99898063 0.99593496]

mean value: 0.9979662369680692

key: test_precision
value: [0.91525424 0.96428571 1.         1.         0.98181818 1.
 0.94827586 0.98214286 0.96491228 0.94827586]

mean value: 0.9704964995374574

key: train_precision
value: [1.         0.99796748 0.9959432  0.9959432  0.99392713 0.99593496
 0.99391481 0.99593496 0.99796334 0.99190283]

mean value: 0.9959431915048893

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95454545 0.98181818 1.         1.         0.99090909 1.
 0.97222222 0.99074074 0.98148148 0.97222222]

mean value: 0.9843939393939394

key: train_roc_auc
value: [1.         0.99897959 0.99795918 0.99795918 0.99693878 0.99796334
 0.99694501 0.99796334 0.99898167 0.99592668]

mean value: 0.9979616775427075

key: test_jcc
value: [0.91525424 0.96428571 1.         1.         0.98181818 1.
 0.94827586 0.98214286 0.96491228 0.94827586]

mean value: 0.9704964995374574

key: train_jcc
value: [1.         0.99796748 0.9959432  0.9959432  0.99392713 0.99593496
 0.99391481 0.99593496 0.99796334 0.99190283]

mean value: 0.9959431915048893

MCC on Blind test: 0.25

Accuracy on Blind test: 0.57

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.47767258 0.4457314  0.63524914 0.67361498 0.61306643 0.56355143
 0.53762197 0.56128907 0.43451262 0.60698318]

mean value: 0.5549292802810669

key: score_time
value: [0.02317858 0.026896   0.04104447 0.04110169 0.04912281 0.04197502
 0.04222465 0.0552702  0.03481197 0.041291  ]

mean value: 0.0396916389465332

key: test_mcc
value: [0.946411   0.96396098 0.98181818 1.         0.98181818 1.
 0.92905936 0.98181211 1.         0.98181211]

mean value: 0.9766691930577851

key: train_mcc
value: [0.99796333 0.99593079 0.99390234 0.99390234 0.99390234 0.99390241
 0.99390241 0.99390241 0.99390241 0.99390241]

mean value: 0.9945113200343811

key: test_accuracy
value: [0.97247706 0.98165138 0.99082569 1.         0.99082569 1.
 0.96330275 0.99082569 1.         0.99082569]

mean value: 0.9880733944954129

key: train_accuracy
value: [0.99898063 0.99796126 0.9969419  0.9969419  0.9969419  0.9969419
 0.9969419  0.9969419  0.9969419  0.9969419 ]

mean value: 0.9972477064220183

key: test_fscore
value: [0.97297297 0.98181818 0.99082569 1.         0.99082569 1.
 0.96491228 0.99099099 1.         0.99099099]

mean value: 0.988333679362168

key: train_fscore
value: [0.99898271 0.99796748 0.99695431 0.99695431 0.99695431 0.99694812
 0.99694812 0.99694812 0.99694812 0.99694812]

mean value: 0.9972553719869787

key: test_precision
value: [0.94736842 0.96428571 0.98181818 1.         0.98181818 1.
 0.93220339 0.98214286 1.         0.98214286]

mean value: 0.9771779603090932

key: train_precision
value: [0.99796748 0.9959432  0.99392713 0.99392713 0.99392713 0.99391481
 0.99391481 0.99391481 0.99391481 0.99391481]

mean value: 0.9945266097572326

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97272727 0.98181818 0.99090909 1.         0.99090909 1.
 0.96296296 0.99074074 1.         0.99074074]

mean value: 0.9880808080808081

key: train_roc_auc
value: [0.99897959 0.99795918 0.99693878 0.99693878 0.99693878 0.99694501
 0.99694501 0.99694501 0.99694501 0.99694501]

mean value: 0.9972480152957314

key: test_jcc
value: [0.94736842 0.96428571 0.98181818 1.         0.98181818 1.
 0.93220339 0.98214286 1.         0.98214286]

mean value: 0.9771779603090932

key: train_jcc
value: [0.99796748 0.9959432  0.99392713 0.99392713 0.99392713 0.99391481
 0.99391481 0.99391481 0.99391481 0.99391481]

mean value: 0.9945266097572326

MCC on Blind test: 0.16

Accuracy on Blind test: 0.54

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [1.11533022 1.10044742 1.10230374 1.10466123 1.10110593 1.12116337
 1.1251564  1.11318278 1.11536574 1.13156414]

mean value: 1.11302809715271

key: score_time
value: [0.00962567 0.00945687 0.00967073 0.00957179 0.00962257 0.00992775
 0.010005   0.0098331  0.01052403 0.01036096]

mean value: 0.009859848022460937

key: test_mcc
value: [0.9291517  0.98181818 1.         0.98181818 0.946411   0.98181211
 0.94635821 0.96393713 0.98181211 0.94635821]

mean value: 0.9659476849273602

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96330275 0.99082569 1.         0.99082569 0.97247706 0.99082569
 0.97247706 0.98165138 0.99082569 0.97247706]

mean value: 0.9825688073394496

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.99082569 1.         0.99082569 0.97297297 0.99099099
 0.97345133 0.98214286 0.99099099 0.97345133]

mean value: 0.9829937557397572

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.93103448 0.98181818 1.         0.98181818 0.94736842 0.98214286
 0.94827586 0.96491228 0.98214286 0.94827586]

mean value: 0.9667788986573016

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96363636 0.99090909 1.         0.99090909 0.97272727 0.99074074
 0.97222222 0.98148148 0.99074074 0.97222222]

mean value: 0.9825589225589225

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.98181818 1.         0.98181818 0.94736842 0.98214286
 0.94827586 0.96491228 0.98214286 0.94827586]

mean value: 0.9667788986573016

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.28

Accuracy on Blind test: 0.59

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.04244852 0.04812479 0.04577637 0.0463593  0.04909849 0.04620719
 0.04703236 0.05215859 0.04779005 0.04598045]

mean value: 0.04709761142730713

key: score_time
value: [0.01303792 0.02137995 0.01299429 0.01325035 0.02061152 0.01301885
 0.01324797 0.01297951 0.02275825 0.01454926]

mean value: 0.015782785415649415

key: test_mcc
value: [1.         1.         1.         1.         1.         1.
 0.34851405 1.         1.         1.        ]

mean value: 0.9348514050091998

key: train_mcc
value: [1.         1.         1.         1.         1.         0.96198449
 0.35025059 1.         1.         1.        ]

mean value: 0.9312235085917535

key: test_accuracy
value: [1.         1.         1.         1.         1.         1.
 0.60550459 1.         1.         1.        ]

mean value: 0.9605504587155963

key: train_accuracy
value: [1.         1.         1.         1.         1.         0.98063201
 0.60958206 1.         1.         1.        ]

mean value: 0.9590214067278288

key: test_fscore
value: [1.         1.         1.         1.         1.         1.
 0.35820896 1.         1.         1.        ]

mean value: 0.935820895522388

key: train_fscore
value: [1.         1.         1.         1.         1.         0.98022893
 0.35845896 1.         1.         1.        ]

mean value: 0.9338687889673829

key: test_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         1.         1.         1.         1.         1.
 0.21818182 1.         1.         1.        ]

mean value: 0.9218181818181819

key: train_recall
value: [1.         1.         1.         1.         1.         0.96122449
 0.21836735 1.         1.         1.        ]

mean value: 0.9179591836734694

key: test_roc_auc
value: [1.         1.         1.         1.         1.         1.
 0.60909091 1.         1.         1.        ]

mean value: 0.9609090909090909

key: train_roc_auc
value: [1.         1.         1.         1.         1.         0.98061224
 0.60918367 1.         1.         1.        ]

mean value: 0.9589795918367346

key: test_jcc
value: [1.         1.         1.         1.         1.         1.
 0.21818182 1.         1.         1.        ]

mean value: 0.9218181818181819

key: train_jcc
value: [1.         1.         1.         1.         1.         0.96122449
 0.21836735 1.         1.         1.        ]

mean value: 0.9179591836734694

MCC on Blind test: 0.0

Accuracy on Blind test: 0.51

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02674341 0.03784299 0.04292107 0.01843095 0.02614713 0.0395968
 0.01892185 0.01898193 0.01882291 0.02836633]

mean value: 0.027677536010742188

key: score_time
value: [0.01909041 0.01905203 0.01903725 0.01596856 0.01496267 0.0134511
 0.01707458 0.01234651 0.0223639  0.02151322]

mean value: 0.017486023902893066

key: test_mcc
value: [0.946411   0.9291517  1.         0.98181818 0.946411   0.98181211
 0.91202529 0.94635821 1.         0.91202529]

mean value: 0.9556012783200151

key: train_mcc
value: [0.96789422 0.96592058 0.96789422 0.96592058 0.96789422 0.96592295
 0.96987347 0.96987347 0.96198744 0.97185442]

mean value: 0.9675035590265852

key: test_accuracy
value: [0.97247706 0.96330275 1.         0.99082569 0.97247706 0.99082569
 0.95412844 0.97247706 1.         0.95412844]

mean value: 0.9770642201834863

key: train_accuracy
value: [0.98369011 0.98267074 0.98369011 0.98267074 0.98369011 0.98267074
 0.98470948 0.98470948 0.98063201 0.98572885]

mean value: 0.9834862385321101

key: test_fscore
value: [0.97297297 0.96428571 1.         0.99082569 0.97297297 0.99099099
 0.95652174 0.97345133 1.         0.95652174]

mean value: 0.9778543144990544

key: train_fscore
value: [0.98396794 0.98298298 0.98396794 0.98298298 0.98396794 0.98294885
 0.98492462 0.98492462 0.98098098 0.98591549]

mean value: 0.9837564340290699

key: test_precision
value: [0.94736842 0.93103448 1.         0.98181818 0.94736842 0.98214286
 0.91666667 0.94827586 1.         0.91666667]

mean value: 0.9571341559227221

key: train_precision
value: [0.96844181 0.96653543 0.96844181 0.96653543 0.96844181 0.96646943
 0.97029703 0.97029703 0.96267191 0.97222222]

mean value: 0.9680353925262213

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97272727 0.96363636 1.         0.99090909 0.97272727 0.99074074
 0.9537037  0.97222222 1.         0.9537037 ]

mean value: 0.977037037037037

key: train_roc_auc
value: [0.98367347 0.98265306 0.98367347 0.98265306 0.98367347 0.98268839
 0.98472505 0.98472505 0.98065173 0.98574338]

mean value: 0.9834860135500229

key: test_jcc
value: [0.94736842 0.93103448 1.         0.98181818 0.94736842 0.98214286
 0.91666667 0.94827586 1.         0.91666667]

mean value: 0.9571341559227221

key: train_jcc
value: [0.96844181 0.96653543 0.96844181 0.96653543 0.96844181 0.96646943
 0.97029703 0.97029703 0.96267191 0.97222222]

mean value: 0.9680353925262213

MCC on Blind test: 0.3

Accuracy on Blind test: 0.61

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:155: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:158: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.26458216 0.32423019 0.28895402 0.36766052 0.35149288 0.44386435
 0.33424354 0.35568094 0.34150338 0.36406851]

mean value: 0.34362804889678955

key: score_time
value: [0.01919508 0.01234579 0.0192287  0.02447915 0.01916623 0.01922035
 0.0209403  0.01921153 0.02182055 0.01927543]

mean value: 0.019488310813903807

key: test_mcc
value: [0.946411   0.9291517  1.         0.98181818 0.96396098 0.94635821
 0.91202529 0.94635821 0.98181211 0.91202529]

mean value: 0.9519920982298953

key: train_mcc
value: [0.960022   0.96592058 0.956108   0.95806317 0.960022   0.96198744
 0.96198744 0.96987347 0.95611192 0.96592295]

mean value: 0.961601897888028

key: test_accuracy
value: [0.97247706 0.96330275 1.         0.99082569 0.98165138 0.97247706
 0.95412844 0.97247706 0.99082569 0.95412844]

mean value: 0.9752293577981652

key: train_accuracy
value: [0.97961264 0.98267074 0.9775739  0.97859327 0.97961264 0.98063201
 0.98063201 0.98470948 0.9775739  0.98267074]

mean value: 0.980428134556575

key: test_fscore
value: [0.97297297 0.96428571 1.         0.99082569 0.98181818 0.97345133
 0.95652174 0.97345133 0.99099099 0.95652174]

mean value: 0.9760839681269381

key: train_fscore
value: [0.98003992 0.98298298 0.97808765 0.97906281 0.98003992 0.98098098
 0.98098098 0.98492462 0.97804391 0.98294885]

mean value: 0.9808092628062846

key: test_precision
value: [0.94736842 0.93103448 1.         0.98181818 0.96428571 0.94827586
 0.91666667 0.94827586 0.98214286 0.91666667]

mean value: 0.953653471452927

key: train_precision
value: [0.96086106 0.96653543 0.95711501 0.95898438 0.96086106 0.96267191
 0.96267191 0.97029703 0.95703125 0.96646943]

mean value: 0.9623498450426142

key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.97272727 0.96363636 1.         0.99090909 0.98181818 0.97222222
 0.9537037  0.97222222 0.99074074 0.9537037 ]

mean value: 0.9751683501683501

key: train_roc_auc
value: [0.97959184 0.98265306 0.97755102 0.97857143 0.97959184 0.98065173
 0.98065173 0.98472505 0.97759674 0.98268839]

mean value: 0.9804272829294651

key: test_jcc
value: [0.94736842 0.93103448 1.         0.98181818 0.96428571 0.94827586
 0.91666667 0.94827586 0.98214286 0.91666667]

mean value: 0.953653471452927

key: train_jcc
value: [0.96086106 0.96653543 0.95711501 0.95898438 0.96086106 0.96267191
 0.96267191 0.97029703 0.95703125 0.96646943]

mean value: 0.9623498450426142

MCC on Blind test: 0.3

Accuracy on Blind test: 0.61

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity',
       ...
       'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
       'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
      dtype='object', length=169)),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.01922441 0.01973104 0.0211575  0.02013302 0.02118015 0.01990414
 0.01957083 0.02115703 0.01995182 0.02084684]

mean value: 0.020285677909851075

key: score_time
value: [0.01134562 0.01130247 0.01128507 0.01133657 0.01124454 0.0112586
 0.01131725 0.01126814 0.01121068 0.01123452]

mean value: 0.011280345916748046

key: test_mcc
value: [0.70710678 0.4472136  1.         0.70710678 0.70710678 1.
 0.70710678 1.         0.33333333 0.33333333]

mean value: 0.6942307386912815

key: train_mcc
value: [0.96362411 1.         0.96362411 0.96362411 1.         0.96362411
 0.96362411 0.96362411 0.96362411 1.        ]

mean value: 0.9745368781616021

key: test_accuracy
value: [0.83333333 0.66666667 1.         0.83333333 0.83333333 1.
 0.83333333 1.         0.66666667 0.66666667]

mean value: 0.8333333333333334

key: train_accuracy
value: [0.98148148 1.         0.98148148 0.98148148 1.         0.98148148
 0.98148148 0.98148148 0.98148148 1.        ]

mean value: 0.987037037037037

key: test_fscore
value: [0.8        0.75       1.         0.8        0.8        1.
 0.85714286 1.         0.66666667 0.66666667]

mean value: 0.834047619047619

key: train_fscore
value: [0.98181818 1.         0.98181818 0.98181818 1.         0.98181818
 0.98181818 0.98181818 0.98181818 1.        ]

mean value: 0.9872727272727273

key: test_precision
value: [1.         0.6        1.         1.         1.         1.
 0.75       1.         0.66666667 0.66666667]

mean value: 0.8683333333333333

key: train_precision
value: [0.96428571 1.         0.96428571 0.96428571 1.         0.96428571
 0.96428571 0.96428571 0.96428571 1.        ]

mean value: 0.975

key: test_recall
value: [0.66666667 1.         1.         0.66666667 0.66666667 1.
 1.         1.         0.66666667 0.66666667]

mean value: 0.8333333333333333

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.83333333 0.66666667 1.         0.83333333 0.83333333 1.
 0.83333333 1.         0.66666667 0.66666667]

mean value: 0.8333333333333334

key: train_roc_auc
value: [0.98148148 1.         0.98148148 0.98148148 1.         0.98148148
 0.98148148 0.98148148 0.98148148 1.        ]

mean value: 0.987037037037037

key: test_jcc
value: [0.66666667 0.6        1.         0.66666667 0.66666667 1.
 0.75       1.         0.5        0.5       ]

mean value: 0.735

key: train_jcc
value: [0.96428571 1.         0.96428571 0.96428571 1.         0.96428571
 0.96428571 0.96428571 0.96428571 1.        ]

mean value: 0.975
Traceback (most recent call last):
  File "/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py", line 165, in <module>
    mm_skf_scoresD4 = MultModelsCl(input_df = X_rus
  File "/home/tanu/git/LSHTM_analysis/scripts/ml/MultModelsCl.py", line 240, in MultModelsCl
    bts_predict = model_pipeline.predict(blind_test_input_df)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict
    Xt = transform.transform(Xt)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform
    Xs = self._fit_transform(
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform
    return Parallel(n_jobs=self.n_jobs)(
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
    return self.function(*args, **kwargs)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one
    res = transformer.transform(X)
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform
    X_int, X_mask = self._transform(
  File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform
    raise ValueError(msg)
ValueError: Found unknown categories ['Other'] in column 5 during transform