LSHTM_analysis/scripts/ml/log_rpob_config.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 1133

PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation

or_mychisq          339
log10_or_mychisq    339
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

No. of numerical features: 46
No. of categorical features: 7

index: 0
ind: 1

Mask count check: True

index: 1
ind: 2

Mask count check: True

index: 2
ind: 3

Mask count check: True
Original Data
 Counter({0: 282, 1: 275}) Data dim: (557, 53)

-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (557, 53)
Test data size: (575, 53)
y_train numbers: Counter({0: 282, 1: 275})
y_train ratio: 1.0254545454545454

y_test_numbers: Counter({0: 545, 1: 30})
y_test ratio: 18.166666666666668
-------------------------------------------------------------
Simple Random OverSampling
 Counter({0: 282, 1: 282})
(564, 53)
Simple Random UnderSampling
 Counter({0: 275, 1: 275})
(550, 53)
Simple Combined Over and UnderSampling
 Counter({0: 282, 1: 282})
(564, 53)
SMOTE_NC OverSampling
 Counter({0: 282, 1: 282})
(564, 53)

#####################################################################

Running ML analysis: UQ [without AA  index but with active site annotations]
Gene name: rpoB
Drug name: rifampicin

Output directory: /home/tanu/git/Data/rifampicin/output/ml/uq_v1/

Sanity checks:
Total input features: 53

Training data size: (557, 53)
Test data size: (575, 53)

Target feature numbers (training data): Counter({0: 282, 1: 275})
Target features ratio (training data: 1.0254545454545454

Target feature numbers (test data): Counter({0: 545, 1: 30})
Target features ratio (test data): 18.166666666666668

#####################################################################


================================================================

Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02604318 0.02941751 0.02129436 0.0241468  0.02635837 0.02950144
 0.02385283 0.02729487 0.02373672 0.02400374]

mean value: 0.025564980506896973

key: score_time
value: [0.01137805 0.01106167 0.01085043 0.0108633  0.01163006 0.01123095
 0.01119161 0.01098061 0.01095009 0.01113772]

mean value: 0.011127448081970215

key: test_mcc
value: [0.93103448 0.82149863 0.89342711 0.82195294 0.71611487 0.85933785
 0.75047877 0.78174603 0.71735629 0.8565805 ]

mean value: 0.8149527494116898

key: train_mcc
value: [0.8246123  0.83651026 0.81662709 0.8246123  0.84078809 0.82921429
 0.8366859  0.8249619  0.81699263 0.82954689]

mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
0.8280551641651794

key: test_accuracy
value: [0.96428571 0.91071429 0.94642857 0.91071429 0.85714286 0.92857143
 0.875      0.89090909 0.85454545 0.92727273]

mean value: 0.9065584415584416

key: train_accuracy
value: [0.91217565 0.91816367 0.90818363 0.91217565 0.92015968 0.91417166
 0.91816367 0.9123506  0.90836653 0.91434263]

mean value: 0.9138253373730626

key: test_fscore
value: [0.96428571 0.90566038 0.94736842 0.9122807  0.86206897 0.92592593
 0.87719298 0.88888889 0.86206897 0.92307692]

mean value: 0.9068817865833584

key: train_fscore
value: [0.9123506  0.91816367 0.908      0.912      0.92031873 0.91485149
 0.91816367 0.9123506  0.90836653 0.91518738]

mean value: 0.9139752661367001

key: test_precision
value: [0.93103448 0.92307692 0.93103448 0.89655172 0.83333333 0.96153846
 0.86206897 0.88888889 0.80645161 0.96      ]

mean value: 0.8993978874913247

key: train_precision
value: [0.9015748  0.90909091 0.8972332  0.90118577 0.90588235 0.89534884
 0.90551181 0.9015748  0.8976378  0.8957529 ]

mean value: 0.9010793179924724

key: test_recall
value: [1.         0.88888889 0.96428571 0.92857143 0.89285714 0.89285714
 0.89285714 0.88888889 0.92592593 0.88888889]

mean value: 0.9164021164021164

key: train_recall
value: [0.9233871  0.92741935 0.91902834 0.92307692 0.93522267 0.93522267
 0.93117409 0.9233871  0.91935484 0.93548387]

mean value: 0.9272756954420791

key: test_roc_auc
value: [0.96551724 0.90996169 0.94642857 0.91071429 0.85714286 0.92857143
 0.875      0.89087302 0.85582011 0.9265873 ]

mean value: 0.9066616493340631

key: train_roc_auc
value: [0.91228643 0.91825513 0.90833307 0.91232586 0.92036724 0.91446173
 0.91834295 0.91248095 0.90849632 0.91459233]

mean value: 0.9139942013981738

key: test_jcc
value: [0.93103448 0.82758621 0.9        0.83870968 0.75757576 0.86206897
 0.78125    0.8        0.75757576 0.85714286]

mean value: 0.8312943704886141

key: train_jcc
value: [0.83882784 0.84870849 0.83150183 0.83823529 0.85239852 0.84306569
 0.84870849 0.83882784 0.83211679 0.84363636]

mean value: 0.8416027146818327

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.6813972  0.78868866 0.81215572 0.72905326 0.82197165 0.74409461
 0.71482921 0.76164103 0.68979573 0.75526428]

mean value: 0.7498891353607178

key: score_time
value: [0.01248741 0.01272368 0.01245189 0.01255608 0.01290011 0.01112986
 0.01275086 0.012429   0.01255012 0.01147771]

mean value: 0.012345671653747559

key: test_mcc
value: [0.96481304 0.9284802  0.92857143 0.89802651 0.64285714 0.8660254
 0.82195294 0.89153439 0.82337971 0.8565805 ]

mean value: 0.8622221274212197

key: train_mcc
value: [0.91621503 0.94017409 0.93217802 0.93212612 0.95608442 0.92815126
 0.92815126 0.94040302 0.916326   0.93624587]

mean value: 0.932605508864504

key: test_accuracy
value: [0.98214286 0.96428571 0.96428571 0.94642857 0.82142857 0.92857143
 0.91071429 0.94545455 0.90909091 0.92727273]

mean value: 0.9299675324675325

key: train_accuracy
value: [0.95808383 0.97005988 0.96606786 0.96606786 0.97804391 0.96407186
 0.96407186 0.97011952 0.95816733 0.96812749]

mean value: 0.9662881408497745

key: test_fscore
value: [0.98113208 0.96296296 0.96428571 0.94915254 0.82142857 0.92307692
 0.9122807  0.94545455 0.9122807  0.92307692]

mean value: 0.9295131661638991

key: train_fscore
value: [0.95740365 0.96957404 0.96537678 0.96551724 0.97768763 0.96341463
 0.96341463 0.9694501  0.95757576 0.96774194]

mean value: 0.9657156401043632

key: test_precision
value: [1.         0.96296296 0.96428571 0.90322581 0.82142857 1.
 0.89655172 0.92857143 0.86666667 0.96      ]

mean value: 0.9303692874504887

key: train_precision
value: [0.96326531 0.9755102  0.97131148 0.96747967 0.9796748  0.96734694
 0.96734694 0.97942387 0.95951417 0.96774194]

mean value: 0.9698615308546767

key: test_recall
value: [0.96296296 0.96296296 0.96428571 1.         0.82142857 0.85714286
 0.92857143 0.96296296 0.96296296 0.88888889]

mean value: 0.9312169312169312

key: train_recall
value: [0.9516129  0.96370968 0.95951417 0.96356275 0.9757085  0.95951417
 0.95951417 0.95967742 0.95564516 0.96774194]

mean value: 0.961620086195638

key: test_roc_auc
value: [0.98148148 0.9642401  0.96428571 0.94642857 0.82142857 0.92857143
 0.91071429 0.9457672  0.91005291 0.9265873 ]

mean value: 0.9299557562488597

key: train_roc_auc
value: [0.95801989 0.96999713 0.96597756 0.96603335 0.97801173 0.96400905
 0.96400905 0.96999619 0.95813754 0.96812294]

mean value: 0.9662314429920021

key: test_jcc
value: [0.96296296 0.92857143 0.93103448 0.90322581 0.6969697  0.85714286
 0.83870968 0.89655172 0.83870968 0.85714286]

mean value: 0.8711021170976677

key: train_jcc
value: [0.91828794 0.94094488 0.93307087 0.93333333 0.95634921 0.92941176
 0.92941176 0.94071146 0.91860465 0.9375    ]

mean value: 0.9337625868482374/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


MCC on Blind test: 0.23

Accuracy on Blind test: 0.65

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.02240419 0.00845432 0.00848174 0.0077436  0.00821161 0.00828028
 0.00818753 0.00795102 0.00834084 0.00831008]

mean value: 0.009636521339416504

key: score_time
value: [0.01104975 0.00858045 0.00879431 0.0087378  0.00867748 0.0087285
 0.00872374 0.00838089 0.00867343 0.00884008]

mean value: 0.0089186429977417

key: test_mcc
value: [0.74266517 0.48372032 0.77459667 0.71611487 0.40574111 0.61065803
 0.55328334 0.68300095 0.74935731 0.74935731]

mean value: 0.6468495081997219

key: train_mcc
value: [0.66487805 0.68935419 0.66458942 0.66570983 0.62725669 0.69324149
 0.67986963 0.68418537 0.66184784 0.66877084]

mean value: 0.6699703343840876

key: test_accuracy
value: [0.85714286 0.73214286 0.875      0.85714286 0.69642857 0.80357143
 0.76785714 0.83636364 0.87272727 0.87272727]

mean value: 0.8171103896103896

key: train_accuracy
value: [0.82634731 0.83832335 0.82634731 0.8243513  0.79840319 0.84231537
 0.83433134 0.83665339 0.8247012  0.82669323]

mean value: 0.8278466970441587

key: test_fscore
value: [0.82608696 0.66666667 0.85714286 0.85185185 0.65306122 0.79245283
 0.73469388 0.81632653 0.8627451  0.8627451 ]

mean value: 0.7923772991103286

key: train_fscore
value: [0.80536913 0.81879195 0.80449438 0.79816514 0.75662651 0.82560706
 0.81431767 0.81777778 0.80269058 0.80272109]

mean value: 0.8046561286055279

key: test_precision
value: [1.         0.83333333 1.         0.88461538 0.76190476 0.84
 0.85714286 0.90909091 0.91666667 0.91666667]

mean value: 0.8919420579420579

key: train_precision
value: [0.90452261 0.91959799 0.9040404  0.92063492 0.93452381 0.90776699
 0.91       0.91089109 0.9040404  0.91709845]

mean value: 0.9133116666250641

key: test_recall
value: [0.7037037  0.55555556 0.75       0.82142857 0.57142857 0.75
 0.64285714 0.74074074 0.81481481 0.81481481]

mean value: 0.7165343915343916

key: train_recall
value: [0.72580645 0.73790323 0.72469636 0.70445344 0.63562753 0.75708502
 0.73684211 0.74193548 0.72177419 0.71370968]

mean value: 0.7199833485699361

key: test_roc_auc
value: [0.85185185 0.72605364 0.875      0.85714286 0.69642857 0.80357143
 0.76785714 0.83465608 0.87169312 0.87169312]

mean value: 0.8155947819740923

key: train_roc_auc
value: [0.82535382 0.83733106 0.8249466  0.82269916 0.79616022 0.84114094
 0.83298798 0.83553467 0.82348552 0.82535878]

mean value: 0.8264998750879309

key: test_jcc
value: [0.7037037  0.5        0.75       0.74193548 0.48484848 0.65625
 0.58064516 0.68965517 0.75862069 0.75862069]

mean value: 0.6624279385437617

key: train_jcc
value: [0.6741573  0.69318182 0.67293233 0.66412214 0.60852713 0.70300752
 0.68679245 0.69172932 0.67041199 0.67045455]

mean value: 0.6735316546975922

MCC on Blind test: 0.34

Accuracy on Blind test: 0.78

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00915837 0.008641   0.00853086 0.00855017 0.00856733 0.00867844
 0.00850987 0.00852513 0.00848937 0.00854254]

mean value: 0.008619308471679688

key: score_time
value: [0.00892973 0.00885773 0.00864935 0.00866771 0.00876236 0.00893545
 0.0087409  0.00873089 0.00877738 0.00871181]

mean value: 0.008776330947875976

key: test_mcc
value: [0.89342711 0.74984143 0.85714286 0.71428571 0.67900461 0.78571429
 0.64285714 0.71049701 0.75878131 0.74935731]

mean value: 0.7540908782235038

key: train_mcc
value: [0.76073062 0.76464682 0.75244668 0.78078676 0.77655234 0.75249829
 0.76042979 0.77325226 0.78086182 0.77758373]

mean value: 0.7679789125420294

key: test_accuracy
value: [0.94642857 0.875      0.92857143 0.85714286 0.83928571 0.89285714
 0.82142857 0.85454545 0.87272727 0.87272727]

mean value: 0.8760714285714286

key: train_accuracy
value: [0.88023952 0.88223553 0.8762475  0.89021956 0.88822355 0.8762475
 0.88023952 0.88645418 0.89043825 0.88844622]

mean value: 0.8838991340029105

key: test_fscore
value: [0.94545455 0.86792453 0.92857143 0.85714286 0.84210526 0.89285714
 0.82142857 0.84615385 0.88135593 0.8627451 ]

mean value: 0.8745739213310778

key: train_fscore
value: [0.88047809 0.88223553 0.87449393 0.89021956 0.8875502  0.875
 0.87804878 0.88667992 0.88933602 0.88932806]

mean value: 0.8833370085701109

key: test_precision
value: [0.92857143 0.88461538 0.92857143 0.85714286 0.82758621 0.89285714
 0.82142857 0.88       0.8125     0.91666667]

mean value: 0.8749939686750031

key: train_precision
value: [0.87007874 0.87351779 0.87449393 0.87795276 0.88047809 0.87148594
 0.88163265 0.8745098  0.8875502  0.87209302]

mean value: 0.8763792922216086

key: test_recall
value: [0.96296296 0.85185185 0.92857143 0.85714286 0.85714286 0.89285714
 0.82142857 0.81481481 0.96296296 0.81481481]

mean value: 0.8764550264550264

key: train_recall
value: [0.89112903 0.89112903 0.87449393 0.90283401 0.89473684 0.87854251
 0.87449393 0.89919355 0.89112903 0.90725806]

mean value: 0.8904939924252318

key: test_roc_auc
value: [0.94699872 0.87420179 0.92857143 0.85714286 0.83928571 0.89285714
 0.82142857 0.85383598 0.87433862 0.87169312]

mean value: 0.8760353950009122

key: train_roc_auc
value: [0.88034712 0.88232341 0.87622334 0.89039338 0.8883133  0.87627913
 0.88016035 0.88660465 0.89044641 0.8886684 ]

mean value: 0.8839759495598507

key: test_jcc
value: [0.89655172 0.76666667 0.86666667 0.75       0.72727273 0.80645161
 0.6969697  0.73333333 0.78787879 0.75862069]

mean value: 0.7790411905484208

key: train_jcc
value: [0.78647687 0.78928571 0.77697842 0.80215827 0.79783394 0.77777778
 0.7826087  0.79642857 0.80072464 0.80071174]

mean value: 0.7910984634590573

MCC on Blind test: 0.29

Accuracy on Blind test: 0.72

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00800991 0.00789165 0.00820088 0.00811696 0.00825524 0.00775909
 0.00819016 0.00785351 0.00778246 0.00778031]

mean value: 0.007984018325805664

key: score_time
value: [0.08504152 0.01459813 0.01290846 0.01306796 0.0130856  0.01207209
 0.01181221 0.01277232 0.01141    0.01144505]

mean value: 0.01982133388519287

key: test_mcc
value: [0.85696041 0.74984143 0.78772636 0.67900461 0.75047877 0.78571429
 0.64450339 0.74569602 0.65330526 0.78353876]

mean value: 0.7436769286015497

key: train_mcc
value: [0.79646836 0.79243629 0.77242951 0.80040802 0.78877235 0.78048897
 0.78837632 0.78487523 0.80887676 0.7817104 ]

mean value: 0.7894842220535155

key: test_accuracy
value: [0.92857143 0.875      0.89285714 0.83928571 0.875      0.89285714
 0.82142857 0.87272727 0.81818182 0.89090909]

mean value: 0.8706818181818182

key: train_accuracy
value: [0.89820359 0.89620758 0.88622754 0.9001996  0.89421158 0.89021956
 0.89421158 0.89243028 0.90438247 0.89043825]

mean value: 0.8946732033940088

key: test_fscore
value: [0.92592593 0.86792453 0.89655172 0.83636364 0.87272727 0.89285714
 0.82758621 0.86792453 0.83333333 0.88461538]

mean value: 0.8705809683460952

key: train_fscore
value: [0.89779559 0.89558233 0.88484848 0.89919355 0.89421158 0.88933602
 0.89249493 0.89156627 0.904      0.89151874]

mean value: 0.8940547478417012

key: test_precision
value: [0.92592593 0.88461538 0.86666667 0.85185185 0.88888889 0.89285714
 0.8        0.88461538 0.75757576 0.92      ]

mean value: 0.8672997002997003

key: train_precision
value: [0.89243028 0.892      0.88306452 0.89558233 0.88188976 0.884
 0.89430894 0.888      0.8968254  0.87258687]

mean value: 0.8880688100611991

key: test_recall
value: [0.92592593 0.85185185 0.92857143 0.82142857 0.85714286 0.89285714
 0.85714286 0.85185185 0.92592593 0.85185185]

mean value: 0.8764550264550265

key: train_recall
value: [0.90322581 0.89919355 0.88663968 0.90283401 0.90688259 0.89473684
 0.89068826 0.89516129 0.91129032 0.91129032]

mean value: 0.9001942666840799

key: test_roc_auc
value: [0.9284802  0.87420179 0.89285714 0.83928571 0.875      0.89285714
 0.82142857 0.8723545  0.82010582 0.89021164]

mean value: 0.8706782521437694

key: train_roc_auc
value: [0.89825322 0.89623709 0.88623322 0.9002359  0.89438618 0.89028181
 0.89416303 0.89246253 0.90446406 0.89068453]

mean value: 0.8947401572130679

key: test_jcc
value: [0.86206897 0.76666667 0.8125     0.71875    0.77419355 0.80645161
 0.70588235 0.76666667 0.71428571 0.79310345]

mean value: 0.772056897564365

key: train_jcc
value: [0.81454545 0.81090909 0.79347826 0.81684982 0.80866426 0.80072464
 0.80586081 0.80434783 0.82481752 0.80427046]

mean value: 0.8084468133612275

MCC on Blind test: 0.25

Accuracy on Blind test: 0.72

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01613522 0.01378536 0.01915169 0.01421118 0.01374722 0.01693225
 0.014081   0.0139122  0.01385522 0.01448727]

mean value: 0.01502985954284668

key: score_time
value: [0.00889826 0.00894713 0.00886488 0.00883365 0.00874567 0.00876617
 0.00901341 0.00938988 0.00874805 0.00884128]

mean value: 0.008904838562011718

key: test_mcc
value: [0.89342711 0.82149863 0.89342711 0.71428571 0.67900461 0.85714286
 0.71611487 0.71049701 0.71735629 0.74935731]

mean value: 0.7752111518648109

key: train_mcc
value: [0.77670104 0.78487855 0.77670104 0.79675795 0.80065667 0.78078676
 0.79658289 0.79328084 0.79284399 0.78122197]

mean value: 0.7880411697036507

key: test_accuracy
value: [0.94642857 0.91071429 0.94642857 0.85714286 0.83928571 0.92857143
 0.85714286 0.85454545 0.85454545 0.87272727]

mean value: 0.8867532467532467

key: train_accuracy
value: [0.88822355 0.89221557 0.88822355 0.89820359 0.9001996  0.89021956
 0.89820359 0.89641434 0.89641434 0.89043825]

mean value: 0.8938755954227005

key: test_fscore
value: [0.94545455 0.90566038 0.94736842 0.85714286 0.84210526 0.92857143
 0.86206897 0.84615385 0.86206897 0.8627451 ]

mean value: 0.8859339767965393

key: train_fscore
value: [0.88844622 0.89285714 0.888      0.89820359 0.9        0.89021956
 0.89779559 0.8968254  0.89558233 0.89065606]

mean value: 0.893858589263252

key: test_precision
value: [0.92857143 0.92307692 0.93103448 0.85714286 0.82758621 0.92857143
 0.83333333 0.88       0.80645161 0.91666667]

mean value: 0.8832434939921036

key: train_precision
value: [0.87795276 0.87890625 0.87747036 0.88582677 0.88932806 0.87795276
 0.88888889 0.8828125  0.892      0.87843137]

mean value: 0.8829569713874807

key: test_recall
value: [0.96296296 0.88888889 0.96428571 0.85714286 0.85714286 0.92857143
 0.89285714 0.81481481 0.92592593 0.81481481]

mean value: 0.8907407407407407

key: train_recall
value: [0.89919355 0.90725806 0.89878543 0.91093117 0.91093117 0.90283401
 0.90688259 0.91129032 0.89919355 0.90322581]

mean value: 0.9050525662792216

key: test_roc_auc
value: [0.94699872 0.90996169 0.94642857 0.85714286 0.83928571 0.92857143
 0.85714286 0.85383598 0.85582011 0.87169312]

mean value: 0.8866881043605181

key: train_roc_auc
value: [0.88833195 0.89236421 0.88836909 0.89837897 0.90034748 0.89039338
 0.89832319 0.89659004 0.89644717 0.89058928]

mean value: 0.8940134761930483

key: test_jcc
value: [0.89655172 0.82758621 0.9        0.75       0.72727273 0.86666667
 0.75757576 0.73333333 0.75757576 0.75862069]

mean value: 0.7975182863113898

key: train_jcc
value: [0.79928315 0.80645161 0.79856115 0.81521739 0.81818182 0.80215827
 0.81454545 0.81294964 0.81090909 0.80286738]

mean value: 0.8081124970226548

MCC on Blind test: 0.22

Accuracy on Blind test: 0.71

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.44565105 1.50711632 1.54303265 1.40061927 1.64806414 1.59813452
 1.42913032 1.62204456 1.88645577 1.49996781]

mean value: 1.558021640777588

key: score_time
value: [0.01188302 0.01363969 0.01324034 0.01340175 0.01373863 0.01372957
 0.0137701  0.02073598 0.01149464 0.0171504 ]

mean value: 0.014278411865234375

key: test_mcc
value: [0.96490128 0.89342711 0.82195294 0.93094934 0.75047877 0.83484711
 0.82195294 0.81878307 0.79069197 0.8565805 ]

mean value: 0.8484565042398484

key: train_mcc
value: [0.96407453 0.96407453 0.97604323 0.96407052 0.97205662 0.97604323
 0.96809206 0.96812294 0.96812294 0.96018795]

mean value: 0.9680888542163917

key: test_accuracy
value: [0.98214286 0.94642857 0.91071429 0.96428571 0.875      0.91071429
 0.91071429 0.90909091 0.89090909 0.92727273]

mean value: 0.9227272727272727

key: train_accuracy
value: [0.98203593 0.98203593 0.98802395 0.98203593 0.98602794 0.98802395
 0.98403194 0.98406375 0.98406375 0.98007968]

mean value: 0.9840422740177016

key: test_fscore
value: [0.98181818 0.94545455 0.9122807  0.96551724 0.87719298 0.90196078
 0.9122807  0.90909091 0.89655172 0.92307692]

mean value: 0.9225224695236438

key: train_fscore
value: [0.98181818 0.98181818 0.98785425 0.98174442 0.98580122 0.98785425
 0.98387097 0.98387097 0.98387097 0.97991968]

mean value: 0.9838423086546554

key: test_precision
value: [0.96428571 0.92857143 0.89655172 0.93333333 0.86206897 1.
 0.89655172 0.89285714 0.83870968 0.96      ]

mean value: 0.9172929710260077

key: train_precision
value: [0.98380567 0.98380567 0.98785425 0.98373984 0.98780488 0.98785425
 0.97991968 0.98387097 0.98387097 0.976     ]

mean value: 0.9838526167702565

key: test_recall
value: [1.         0.96296296 0.92857143 1.         0.89285714 0.82142857
 0.92857143 0.92592593 0.96296296 0.88888889]

mean value: 0.9312169312169312

key: train_recall
value: [0.97983871 0.97983871 0.98785425 0.97975709 0.98380567 0.98785425
 0.98785425 0.98387097 0.98387097 0.98387097]

mean value: 0.983841582865352

key: test_roc_auc
value: [0.98275862 0.94699872 0.91071429 0.96428571 0.875      0.91071429
 0.91071429 0.90939153 0.89219577 0.9265873 ]

mean value: 0.9229360518153622

key: train_roc_auc
value: [0.98201422 0.98201422 0.98802161 0.98200453 0.98599732 0.98802161
 0.98408461 0.98406147 0.98406147 0.98012446]

mean value: 0.9840405511662667

key: test_jcc
value: [0.96428571 0.89655172 0.83870968 0.93333333 0.78125    0.82142857
 0.83870968 0.83333333 0.8125     0.85714286]

mean value: 0.857724488850045

key: train_jcc
value: [0.96428571 0.96428571 0.976      0.96414343 0.972      0.976
 0.96825397 0.96825397 0.96825397 0.96062992]

mean value: 0.9682106680887996

MCC on Blind test: 0.24

Accuracy on Blind test: 0.62

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01453757 0.01242089 0.01000428 0.01067281 0.00980949 0.01040387
 0.01071548 0.0110333  0.01046252 0.01084757]

mean value: 0.011090779304504394

key: score_time
value: [0.01091075 0.00837231 0.00808406 0.00912213 0.00790167 0.00789714
 0.00791621 0.00794125 0.00791454 0.00789595]

mean value: 0.00839560031890869

key: test_mcc
value: [1.         0.85696041 0.78772636 0.92857143 0.82195294 0.89802651
 0.79385662 0.89153439 1.         0.74935731]

mean value: 0.8727985977030033

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.92857143 0.89285714 0.96428571 0.91071429 0.94642857
 0.89285714 0.94545455 1.         0.87272727]

mean value: 0.9353896103896104

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.92592593 0.89655172 0.96428571 0.9122807  0.94339623
 0.9        0.94545455 1.         0.8627451 ]

mean value: 0.9350639936012812

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.92592593 0.86666667 0.96428571 0.89655172 1.
 0.84375    0.92857143 1.         0.91666667]

mean value: 0.9342418126254333

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.92592593 0.92857143 0.96428571 0.92857143 0.89285714
 0.96428571 0.96296296 1.         0.81481481]

mean value: 0.9382275132275132

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9284802  0.89285714 0.96428571 0.91071429 0.94642857
 0.89285714 0.9457672  1.         0.87169312]

mean value: 0.9353083378945448

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.86206897 0.8125     0.93103448 0.83870968 0.89285714
 0.81818182 0.89655172 1.         0.75862069]

mean value: 0.8810524500527281

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.36

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10190248 0.09926343 0.10202909 0.10004902 0.10008192 0.10025787
 0.10022664 0.09921432 0.10032749 0.10042095]

mean value: 0.10037732124328613

key: score_time
value: [0.01691294 0.01694822 0.0180881  0.01682162 0.01716375 0.01716638
 0.01710129 0.0170927  0.01716757 0.01707959]

mean value: 0.017154216766357422

key: test_mcc
value: [0.93103448 0.78544061 0.89342711 0.89342711 0.78571429 0.82195294
 0.82618439 0.85449735 0.82337971 0.78961518]

mean value: 0.840467318391085

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96428571 0.89285714 0.94642857 0.94642857 0.89285714 0.91071429
 0.91071429 0.92727273 0.90909091 0.89090909]

mean value: 0.9191558441558442

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.88888889 0.94736842 0.94736842 0.89285714 0.90909091
 0.91525424 0.92592593 0.9122807  0.88      ]

mean value: 0.9183320362196365

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.93103448 0.88888889 0.93103448 0.93103448 0.89285714 0.92592593
 0.87096774 0.92592593 0.86666667 0.95652174]

mean value: 0.9120857479606331

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.88888889 0.96428571 0.96428571 0.89285714 0.89285714
 0.96428571 0.92592593 0.96296296 0.81481481]

mean value: 0.9271164021164021

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96551724 0.89272031 0.94642857 0.94642857 0.89285714 0.91071429
 0.91071429 0.92724868 0.91005291 0.88955026]

mean value: 0.9192232256887429

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.8        0.9        0.9        0.80645161 0.83333333
 0.84375    0.86206897 0.83870968 0.78571429]

mean value: 0.8501062357646062

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.33

Accuracy on Blind test: 0.72

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.0078423  0.00767851 0.00772405 0.00781775 0.00766301 0.00781226
 0.00765562 0.00782037 0.00784039 0.00778151]

mean value: 0.007763576507568359

key: score_time
value: [0.0079782  0.00796032 0.00801039 0.0080893  0.00807023 0.0080111
 0.00799298 0.00796652 0.00790787 0.00800228]

mean value: 0.007998919486999512

key: test_mcc
value: [0.96490128 0.82661701 0.85933785 0.75047877 0.4645821  0.75434227
 0.67900461 0.58684513 0.85695439 0.82269299]

mean value: 0.7565756396515464

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98214286 0.91071429 0.92857143 0.875      0.73214286 0.875
 0.83928571 0.78181818 0.92727273 0.90909091]

mean value: 0.876103896103896

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.9122807  0.93103448 0.87719298 0.72727273 0.86792453
 0.84210526 0.73913043 0.92857143 0.90196078]

mean value: 0.87092915151876

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.86666667 0.9        0.86206897 0.74074074 0.92
 0.82758621 0.89473684 0.89655172 0.95833333]

mean value: 0.8830970193683443

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96296296 0.96428571 0.89285714 0.71428571 0.82142857
 0.85714286 0.62962963 0.96296296 0.85185185]

mean value: 0.8657407407407407

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98275862 0.91251596 0.92857143 0.875      0.73214286 0.875
 0.83928571 0.77910053 0.92791005 0.90806878]

mean value: 0.8760353950009123

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.83870968 0.87096774 0.78125    0.57142857 0.76666667
 0.72727273 0.5862069  0.86666667 0.82142857]

mean value: 0.7794883233655481

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.24

Accuracy on Blind test: 0.73

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.32765579 1.30277157 1.36390686 1.35351229 1.34183788 1.42637658
 1.35306215 1.29488921 1.30548406 1.29170752]

mean value: 1.3361203908920287

key: score_time
value: [0.0910337  0.09533978 0.09802961 0.09628296 0.09906578 0.09774327
 0.09189868 0.09146214 0.09067702 0.0920558 ]

mean value: 0.09435887336730957

key: test_mcc
value: [1.         0.89342711 0.92857143 0.93094934 0.78571429 0.93094934
 0.96490128 0.89153439 1.         0.89139151]

mean value: 0.9217438682406724

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.94642857 0.96428571 0.96428571 0.89285714 0.96428571
 0.98214286 0.94545455 1.         0.94545455]

mean value: 0.9605194805194806

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.94545455 0.96428571 0.96551724 0.89285714 0.96296296
 0.98245614 0.94545455 1.         0.94339623]

mean value: 0.9602384519160193

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.92857143 0.96428571 0.93333333 0.89285714 1.
 0.96551724 0.92857143 1.         0.96153846]

mean value: 0.957467475053682

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96296296 0.96428571 1.         0.89285714 0.92857143
 1.         0.96296296 1.         0.92592593]

mean value: 0.9637566137566138

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.94699872 0.96428571 0.96428571 0.89285714 0.96428571
 0.98214286 0.9457672  1.         0.94510582]

mean value: 0.960572888159095

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.89655172 0.93103448 0.93333333 0.80645161 0.92857143
 0.96551724 0.89655172 1.         0.89285714]

mean value: 0.9250868690078924

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.19

Accuracy on Blind test: 0.49

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.78958488 0.90520978 0.92161393 0.90113878 1.04687214 0.9353826
 0.92672181 0.89199233 0.91325641 0.93072248]

mean value: 1.0162495136260987

key: score_time
value: [0.24019027 0.16753531 0.25853181 0.21053815 0.25056458 0.25201178
 0.21247077 0.21787858 0.26128078 0.26930881]

mean value: 0.234031081199646

key: test_mcc
value: [1.         0.89342711 0.92857143 0.93094934 0.85714286 0.93094934
 0.96490128 0.89153439 1.         0.8565805 ]

mean value: 0.9254056246826363

key: train_mcc
value: [0.94423549 0.94817282 0.94817035 0.94817035 0.95628198 0.94423372
 0.94817035 0.95231443 0.94043131 0.94434567]

mean value: 0.9474526465723194

key: test_accuracy
value: [1.         0.94642857 0.96428571 0.96428571 0.92857143 0.96428571
 0.98214286 0.94545455 1.         0.92727273]

mean value: 0.9622727272727273

key: train_accuracy
value: [0.97205589 0.9740519  0.9740519  0.9740519  0.97804391 0.97205589
 0.9740519  0.97609562 0.97011952 0.97211155]

mean value: 0.9736689966680185

key: test_fscore
value: [1.         0.94545455 0.96428571 0.96551724 0.92857143 0.96296296
 0.98245614 0.94545455 1.         0.92307692]

mean value: 0.9617779501536308

key: train_fscore
value: [0.972      0.9739479  0.97384306 0.97384306 0.97795591 0.97188755
 0.97384306 0.976      0.97005988 0.972     ]

mean value: 0.9735380413105856

key: test_precision
value: [1.         0.92857143 0.96428571 0.93333333 0.92857143 1.
 0.96551724 0.92857143 1.         0.96      ]

mean value: 0.9608850574712644

key: train_precision
value: [0.96428571 0.96812749 0.968      0.968      0.96825397 0.96414343
 0.968      0.96825397 0.96047431 0.96428571]

mean value: 0.9661824589714422

key: test_recall
value: [1.         0.96296296 0.96428571 1.         0.92857143 0.92857143
 1.         0.96296296 1.         0.88888889]

mean value: 0.9636243386243386

key: train_recall
value: [0.97983871 0.97983871 0.97975709 0.97975709 0.98785425 0.97975709
 0.97975709 0.98387097 0.97983871 0.97983871]

mean value: 0.981010839754473

key: test_roc_auc
value: [1.         0.94699872 0.96428571 0.96428571 0.92857143 0.96428571
 0.98214286 0.9457672  1.         0.9265873 ]

mean value: 0.9622924648786718

key: train_roc_auc
value: [0.97213279 0.97410908 0.97413051 0.97413051 0.97817909 0.97216201
 0.97413051 0.97618745 0.97023432 0.97220282]

mean value: 0.9737599093111167

key: test_jcc
value: [1.         0.89655172 0.93103448 0.93333333 0.86666667 0.92857143
 0.96551724 0.89655172 1.         0.85714286]

mean value: 0.9275369458128079

key: train_jcc
value: [0.94552529 0.94921875 0.94901961 0.94901961 0.95686275 0.9453125
 0.94901961 0.953125   0.94186047 0.94552529]

mean value: 0.9484488867401317

MCC on Blind test: 0.2

Accuracy on Blind test: 0.5

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01942348 0.00846815 0.0079608  0.00791335 0.00764561 0.00800776
 0.00769567 0.00761008 0.00771356 0.00770426]

mean value: 0.009014272689819336

key: score_time
value: [0.01135564 0.00861812 0.0081501  0.00827861 0.00797629 0.00797629
 0.00789309 0.00791764 0.00794172 0.0080111 ]

mean value: 0.008411860466003418

key: test_mcc
value: [0.89342711 0.74984143 0.85714286 0.71428571 0.67900461 0.78571429
 0.64285714 0.71049701 0.75878131 0.74935731]

mean value: 0.7540908782235038

key: train_mcc
value: [0.76073062 0.76464682 0.75244668 0.78078676 0.77655234 0.75249829
 0.76042979 0.77325226 0.78086182 0.77758373]

mean value: 0.7679789125420294

key: test_accuracy
value: [0.94642857 0.875      0.92857143 0.85714286 0.83928571 0.89285714
 0.82142857 0.85454545 0.87272727 0.87272727]

mean value: 0.8760714285714286

key: train_accuracy
value: [0.88023952 0.88223553 0.8762475  0.89021956 0.88822355 0.8762475
 0.88023952 0.88645418 0.89043825 0.88844622]

mean value: 0.8838991340029105

key: test_fscore
value: [0.94545455 0.86792453 0.92857143 0.85714286 0.84210526 0.89285714
 0.82142857 0.84615385 0.88135593 0.8627451 ]

mean value: 0.8745739213310778

key: train_fscore
value: [0.88047809 0.88223553 0.87449393 0.89021956 0.8875502  0.875
 0.87804878 0.88667992 0.88933602 0.88932806]

mean value: 0.8833370085701109

key: test_precision
value: [0.92857143 0.88461538 0.92857143 0.85714286 0.82758621 0.89285714
 0.82142857 0.88       0.8125     0.91666667]

mean value: 0.8749939686750031

key: train_precision
value: [0.87007874 0.87351779 0.87449393 0.87795276 0.88047809 0.87148594
 0.88163265 0.8745098  0.8875502  0.87209302]

mean value: 0.8763792922216086

key: test_recall
value: [0.96296296 0.85185185 0.92857143 0.85714286 0.85714286 0.89285714
 0.82142857 0.81481481 0.96296296 0.81481481]

mean value: 0.8764550264550264

key: train_recall
value: [0.89112903 0.89112903 0.87449393 0.90283401 0.89473684 0.87854251
 0.87449393 0.89919355 0.89112903 0.90725806]

mean value: 0.8904939924252318

key: test_roc_auc
value: [0.94699872 0.87420179 0.92857143 0.85714286 0.83928571 0.89285714
 0.82142857 0.85383598 0.87433862 0.87169312]

mean value: 0.8760353950009122

key: train_roc_auc
value: [0.88034712 0.88232341 0.87622334 0.89039338 0.8883133  0.87627913
 0.88016035 0.88660465 0.89044641 0.8886684 ]

mean value: 0.8839759495598507

key: test_jcc
value: [0.89655172 0.76666667 0.86666667 0.75       0.72727273 0.80645161
 0.6969697  0.73333333 0.78787879 0.75862069]

mean value: 0.7790411905484208

key: train_jcc
value: [0.78647687 0.78928571 0.77697842 0.80215827 0.79783394 0.77777778
 0.7826087  0.79642857 0.80072464 0.80071174]

mean value: 0.7910984634590573

MCC on Blind test: 0.29

Accuracy on Blind test: 0.72

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.13241124 0.04881454 0.04856825 0.05027437 0.04823184 0.0521431
 0.05157948 0.04875374 0.05209494 0.05204916]

mean value: 0.058492064476013184

key: score_time
value: [0.01028204 0.01021361 0.00997639 0.00988674 0.00968552 0.00974798
 0.00975752 0.00969625 0.01008534 0.00979543]

mean value: 0.009912681579589844

key: test_mcc
value: [1.         0.9284802  0.89342711 0.93094934 0.89342711 0.93094934
 0.92857143 0.89153439 1.         0.89139151]

mean value: 0.9288730432045526

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.96428571 0.94642857 0.96428571 0.94642857 0.96428571
 0.96428571 0.94545455 1.         0.94545455]

mean value: 0.9640909090909091

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.96296296 0.94736842 0.96551724 0.94736842 0.96296296
 0.96428571 0.94545455 1.         0.94339623]

mean value: 0.9639316495565854

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96296296 0.93103448 0.93333333 0.93103448 1.
 0.96428571 0.92857143 1.         0.96153846]

mean value: 0.9612760866209142

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96296296 0.96428571 1.         0.96428571 0.92857143
 0.96428571 0.96296296 1.         0.92592593]

mean value: 0.9673280423280424

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9642401  0.94642857 0.96428571 0.94642857 0.96428571
 0.96428571 0.9457672  1.         0.94510582]

mean value: 0.9640827403758438

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.92857143 0.9        0.93333333 0.9        0.92857143
 0.93103448 0.89655172 1.         0.89285714]

mean value: 0.9310919540229885

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.37

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01652741 0.03145742 0.04161358 0.04130459 0.0417943  0.04165006
 0.03727698 0.04370403 0.04243636 0.04148102]

mean value: 0.037924575805664065

key: score_time
value: [0.01045966 0.01935315 0.02035856 0.02224278 0.02057624 0.0211885
 0.01432323 0.01109576 0.01095986 0.01956701]

mean value: 0.017012476921081543

key: test_mcc
value: [0.93103448 0.82149863 0.89342711 0.82195294 0.67900461 0.89342711
 0.67900461 0.78174603 0.71735629 0.82269299]

mean value: 0.8041144809910427

key: train_mcc
value: [0.86087113 0.84902508 0.84841579 0.8325975  0.86886449 0.85702217
 0.85676029 0.85318007 0.84497964 0.84964116]

mean value: 0.8521357324796069

key: test_accuracy
value: [0.96428571 0.91071429 0.94642857 0.91071429 0.83928571 0.94642857
 0.83928571 0.89090909 0.85454545 0.90909091]

mean value: 0.9011688311688312

key: train_accuracy
value: [0.93013972 0.9241517  0.9241517  0.91616766 0.93413174 0.92814371
 0.92814371 0.92629482 0.92231076 0.92430279]

mean value: 0.9257938306653625

key: test_fscore
value: [0.96428571 0.90566038 0.94736842 0.9122807  0.84210526 0.94545455
 0.84210526 0.88888889 0.86206897 0.90196078]

mean value: 0.9012178924941413

key: train_fscore
value: [0.93069307 0.92490119 0.92369478 0.916      0.93439364 0.92857143
 0.92828685 0.92673267 0.92246521 0.92519685]

mean value: 0.9260935685934734

key: test_precision
value: [0.93103448 0.92307692 0.93103448 0.89655172 0.82758621 0.96296296
 0.82758621 0.88888889 0.80645161 0.95833333]

mean value: 0.895350682461361

key: train_precision
value: [0.91439689 0.90697674 0.91633466 0.90513834 0.91796875 0.91050584
 0.91372549 0.91050584 0.90980392 0.90384615]

mean value: 0.9109202621383721

key: test_recall
value: [1.         0.88888889 0.96428571 0.92857143 0.85714286 0.92857143
 0.85714286 0.88888889 0.92592593 0.85185185]

mean value: 0.9091269841269841

key: train_recall
value: [0.94758065 0.94354839 0.93117409 0.92712551 0.951417   0.94736842
 0.94331984 0.94354839 0.93548387 0.94758065]

mean value: 0.9418146793783466

key: test_roc_auc
value: [0.96551724 0.90996169 0.94642857 0.91071429 0.83928571 0.94642857
 0.83928571 0.89087302 0.85582011 0.90806878]

mean value: 0.9012383689107828

key: train_roc_auc
value: [0.93031206 0.92434336 0.92424846 0.91631866 0.93436992 0.92840862
 0.92835283 0.9264986  0.92246634 0.92457772]

mean value: 0.925989658944721

key: test_jcc
value: [0.93103448 0.82758621 0.9        0.83870968 0.72727273 0.89655172
 0.72727273 0.8        0.75757576 0.82142857]

mean value: 0.8227431874762242

key: train_jcc
value: [0.87037037 0.86029412 0.85820896 0.84501845 0.87686567 0.86666667
 0.866171   0.86346863 0.85608856 0.86080586]

mean value: 0.8623958291829558

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02321148 0.00787091 0.00767159 0.00766253 0.0082829  0.00821114
 0.00813293 0.00844979 0.00815034 0.00824094]

mean value: 0.00958845615386963

key: score_time
value: [0.00829506 0.00817847 0.00787377 0.00799441 0.0085752  0.0085206
 0.00833821 0.00848675 0.00863934 0.00852776]

mean value: 0.008342957496643067

key: test_mcc
value: [0.89342711 0.74984143 0.89342711 0.71428571 0.67900461 0.82195294
 0.71611487 0.71049701 0.75878131 0.74935731]

mean value: 0.7686689426300658

key: train_mcc
value: [0.76059032 0.77655946 0.76451932 0.78061298 0.78453717 0.7684682
 0.78839993 0.78902126 0.77686055 0.77734028]

mean value: 0.7766909479185805

key: test_accuracy
value: [0.94642857 0.875      0.94642857 0.85714286 0.83928571 0.91071429
 0.85714286 0.85454545 0.87272727 0.87272727]

mean value: 0.8832142857142857

key: train_accuracy
value: [0.88023952 0.88822355 0.88223553 0.89021956 0.89221557 0.88423154
 0.89421158 0.89442231 0.88844622 0.88844622]

mean value: 0.8882891587343242

key: test_fscore
value: [0.94545455 0.86792453 0.94736842 0.85714286 0.84210526 0.9122807
 0.86206897 0.84615385 0.88135593 0.8627451 ]

mean value: 0.8824600158777894

key: train_fscore
value: [0.88       0.888      0.88128773 0.88977956 0.89156627 0.88306452
 0.89292929 0.89421158 0.88709677 0.88888889]

mean value: 0.8876824599523696

key: test_precision
value: [0.92857143 0.88461538 0.93103448 0.85714286 0.82758621 0.89655172
 0.83333333 0.88       0.8125     0.91666667]

mean value: 0.8768002084122773

key: train_precision
value: [0.87301587 0.88095238 0.876      0.88095238 0.88446215 0.87951807
 0.89112903 0.88537549 0.88709677 0.875     ]

mean value: 0.8813502159126972

key: test_recall
value: [0.96296296 0.85185185 0.96428571 0.85714286 0.85714286 0.92857143
 0.89285714 0.81481481 0.96296296 0.81481481]

mean value: 0.8907407407407407

key: train_recall
value: [0.88709677 0.89516129 0.88663968 0.89878543 0.89878543 0.88663968
 0.89473684 0.90322581 0.88709677 0.90322581]

mean value: 0.8941393496147316

key: test_roc_auc
value: [0.94699872 0.87420179 0.94642857 0.85714286 0.83928571 0.91071429
 0.85714286 0.85383598 0.87433862 0.87169312]

mean value: 0.8831782521437694

key: train_roc_auc
value: [0.88030728 0.88829211 0.88229622 0.89033759 0.8923061  0.88426472
 0.89421881 0.89452629 0.88843028 0.88862078]

mean value: 0.8883600174671026

key: test_jcc
value: [0.89655172 0.76666667 0.9        0.75       0.72727273 0.83870968
 0.75757576 0.73333333 0.78787879 0.75862069]

mean value: 0.7916609363939731

key: train_jcc
value: [0.78571429 0.79856115 0.78776978 0.80144404 0.80434783 0.79061372
 0.80656934 0.80866426 0.79710145 0.8       ]

mean value: 0.7980785861054747

MCC on Blind test: 0.29

Accuracy on Blind test: 0.71

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0116744  0.01263475 0.0133822  0.01279974 0.01338243 0.01445627
 0.01424313 0.01156998 0.01224852 0.01223946]

mean value: 0.012863087654113769

key: score_time
value: [0.00871682 0.00998259 0.00996375 0.01038527 0.01055145 0.01046562
 0.01043034 0.01041961 0.01059127 0.01041579]

mean value: 0.010192251205444336

key: test_mcc
value: [0.89827421 0.85696041 0.89342711 0.85714286 0.59628479 0.82195294
 0.79385662 0.78174603 0.85449735 0.81854376]

mean value: 0.8172686093718741

key: train_mcc
value: [0.83135263 0.909012   0.87714464 0.89219562 0.81343828 0.85235242
 0.86715942 0.86343244 0.87040305 0.89653312]

mean value: 0.8673023622935534

key: test_accuracy
value: [0.94642857 0.92857143 0.94642857 0.92857143 0.78571429 0.91071429
 0.89285714 0.89090909 0.92727273 0.90909091]

mean value: 0.9066558441558441

key: train_accuracy
value: [0.91217565 0.95409182 0.93812375 0.94610778 0.9001996  0.9261477
 0.93213573 0.93027888 0.93426295 0.94820717]

mean value: 0.9321731039912208

key: test_fscore
value: [0.94736842 0.92592593 0.94736842 0.92857143 0.8125     0.90909091
 0.9        0.88888889 0.92592593 0.90566038]

mean value: 0.9091300297866832

key: train_fscore
value: [0.91666667 0.95257732 0.93861386 0.94523327 0.9070632  0.92555332
 0.93385214 0.92631579 0.93110647 0.948     ]

mean value: 0.9324982031673843

key: test_precision
value: [0.9        0.92592593 0.93103448 0.92857143 0.72222222 0.92592593
 0.84375    0.88888889 0.92592593 0.92307692]

mean value: 0.8915321723295861

key: train_precision
value: [0.86428571 0.97468354 0.91860465 0.94715447 0.83848797 0.92
 0.8988764  0.969163   0.96536797 0.94047619]

mean value: 0.9237099909738861

key: test_recall
value: [1.         0.92592593 0.96428571 0.92857143 0.92857143 0.89285714
 0.96428571 0.88888889 0.92592593 0.88888889]

mean value: 0.9308201058201058

key: train_recall
value: [0.97580645 0.93145161 0.95951417 0.94331984 0.98785425 0.93117409
 0.97165992 0.88709677 0.89919355 0.95564516]

mean value: 0.9442715815593574

key: test_roc_auc
value: [0.94827586 0.9284802  0.94642857 0.92857143 0.78571429 0.91071429
 0.89285714 0.89087302 0.92724868 0.90873016]

mean value: 0.9067893632548805

key: train_roc_auc
value: [0.91280441 0.9538681  0.9384185  0.94606937 0.90140744 0.92621697
 0.93268035 0.92976886 0.93384874 0.94829502]

mean value: 0.9323377764010412

key: test_jcc
value: [0.9        0.86206897 0.9        0.86666667 0.68421053 0.83333333
 0.81818182 0.8        0.86206897 0.82758621]

mean value: 0.8354116482428642

key: train_jcc
value: [0.84615385 0.90944882 0.88432836 0.89615385 0.82993197 0.86142322
 0.87591241 0.8627451  0.87109375 0.90114068]

mean value: 0.873833200438617

MCC on Blind test: 0.18

Accuracy on Blind test: 0.49

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01433849 0.01301265 0.0144105  0.01522279 0.0132041  0.01305103
 0.01268101 0.01373744 0.0126369  0.01393795]

mean value: 0.013623285293579101

key: score_time
value: [0.01049781 0.01047158 0.01053524 0.0104003  0.01044488 0.01042581
 0.01039124 0.01047635 0.01046515 0.01041508]

mean value: 0.010452342033386231

key: test_mcc
value: [0.93069263 0.85951469 0.82618439 0.89802651 0.75047877 0.78571429
 0.73127242 0.85695439 0.92962225 0.8565805 ]

mean value: 0.8425040848772626

key: train_mcc
value: [0.87181962 0.83135263 0.85503558 0.88967789 0.91283821 0.88589338
 0.86743952 0.82906495 0.86468284 0.92034415]

mean value: 0.8728148763208086

key: test_accuracy
value: [0.96428571 0.92857143 0.91071429 0.94642857 0.875      0.89285714
 0.85714286 0.92727273 0.96363636 0.92727273]

mean value: 0.9193181818181818

key: train_accuracy
value: [0.93413174 0.91217565 0.9241517  0.94411178 0.95608782 0.94211577
 0.93213573 0.91035857 0.93027888 0.96015936]

mean value: 0.9345706992389723

key: test_fscore
value: [0.96153846 0.92857143 0.90566038 0.94915254 0.87719298 0.89285714
 0.84       0.92857143 0.96153846 0.92307692]

mean value: 0.9168159748341358

key: train_fscore
value: [0.93023256 0.91666667 0.91774892 0.94488189 0.95454545 0.94302554
 0.9279661  0.91525424 0.92569002 0.95983936]

mean value: 0.9335850744783595

key: test_precision
value: [1.         0.89655172 0.96       0.90322581 0.86206897 0.89285714
 0.95454545 0.89655172 1.         0.96      ]

mean value: 0.9325800817647314

key: train_precision
value: [0.97777778 0.86428571 0.98604651 0.91954023 0.97468354 0.91603053
 0.97333333 0.85865724 0.97757848 0.956     ]

mean value: 0.940393336471731

key: test_recall
value: [0.92592593 0.96296296 0.85714286 1.         0.89285714 0.89285714
 0.75       0.96296296 0.92592593 0.88888889]

mean value: 0.905952380952381

key: train_recall
value: [0.88709677 0.97580645 0.8582996  0.97165992 0.93522267 0.97165992
 0.88663968 0.97983871 0.87903226 0.96370968]

mean value: 0.930896565234426

key: test_roc_auc
value: [0.96296296 0.92975734 0.91071429 0.94642857 0.875      0.89285714
 0.85714286 0.92791005 0.96296296 0.9265873 ]

mean value: 0.9192323481116584

key: train_roc_auc
value: [0.93366696 0.91280441 0.92324429 0.94449138 0.95580031 0.94252287
 0.93150881 0.9111792  0.92967361 0.9602013 ]

mean value: 0.9345093140199082

key: test_jcc
value: [0.92592593 0.86666667 0.82758621 0.90322581 0.78125    0.80645161
 0.72413793 0.86666667 0.92592593 0.85714286]

mean value: 0.8484979599613915

key: train_jcc
value: [0.86956522 0.84615385 0.848      0.89552239 0.91304348 0.89219331
 0.86561265 0.84375    0.86166008 0.92277992]

mean value: 0.8758280888468557

MCC on Blind test: 0.23

Accuracy on Blind test: 0.67

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10799265 0.09333324 0.09343123 0.09375167 0.09355068 0.09351802
 0.09371734 0.09361362 0.09355211 0.09383702]

mean value: 0.09502975940704346

key: score_time
value: [0.01410651 0.01418447 0.01437783 0.01411939 0.01418042 0.01420903
 0.01414371 0.01427364 0.01410794 0.01541471]

mean value: 0.014311766624450684

key: test_mcc
value: [0.96481304 0.89315584 0.96490128 0.89802651 0.85933785 0.93094934
 0.96490128 0.89153439 1.         0.92724868]

mean value: 0.9294868200199901

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98214286 0.94642857 0.98214286 0.94642857 0.92857143 0.96428571
 0.98214286 0.94545455 1.         0.96363636]

mean value: 0.9641233766233765

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98113208 0.94339623 0.98245614 0.94915254 0.93103448 0.96296296
 0.98245614 0.94545455 1.         0.96296296]

mean value: 0.964100807910052

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96153846 0.96551724 0.90322581 0.9        1.
 0.96551724 0.92857143 1.         0.96296296]

mean value: 0.9587333142283087

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96296296 0.92592593 1.         1.         0.96428571 0.92857143
 1.         0.96296296 1.         0.96296296]

mean value: 0.9707671957671957

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98148148 0.94572158 0.98214286 0.94642857 0.92857143 0.96428571
 0.98214286 0.9457672  1.         0.96362434]

mean value: 0.9640166028097062

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96296296 0.89285714 0.96551724 0.90322581 0.87096774 0.92857143
 0.96551724 0.89655172 1.         0.92857143]

mean value: 0.9314742718246611

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.39

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03909659 0.04021478 0.03637838 0.04280972 0.04759765 0.04407358
 0.04242229 0.04342175 0.03089213 0.04606533]

mean value: 0.04129722118377686

key: score_time
value: [0.02080131 0.02186036 0.0172112  0.03137207 0.02341676 0.0170188
 0.03378963 0.01608229 0.01641321 0.03630662]

mean value: 0.02342722415924072

key: test_mcc
value: [1.         0.85696041 0.92857143 0.93094934 0.89342711 0.96490128
 0.96490128 0.89153439 1.         0.89139151]

mean value: 0.9322636750479738

key: train_mcc
value: [0.98803016 0.98403035 0.99204516 0.99204516 0.99204692 0.98803016
 0.99201441 0.99602309 0.98409121 0.99203073]

mean value: 0.9900387331545668

key: test_accuracy
value: [1.         0.92857143 0.96428571 0.96428571 0.94642857 0.98214286
 0.98214286 0.94545455 1.         0.94545455]

mean value: 0.9658766233766234

key: train_accuracy
value: [0.99401198 0.99201597 0.99600798 0.99600798 0.99600798 0.99401198
 0.99600798 0.99800797 0.99203187 0.99601594]

mean value: 0.9950127633179855

key: test_fscore
value: [1.         0.92592593 0.96428571 0.96551724 0.94736842 0.98181818
 0.98245614 0.94545455 1.         0.94339623]

mean value: 0.9656222396682281

key: train_fscore
value: [0.99393939 0.99193548 0.99593496 0.99593496 0.99596774 0.99393939
 0.99595142 0.9979798  0.99190283 0.99596774]

mean value: 0.9949453723311854

key: test_precision
value: [1.         0.92592593 0.96428571 0.93333333 0.93103448 1.
 0.96551724 0.92857143 1.         0.96153846]

mean value: 0.9610206587792794

key: train_precision
value: [0.99595142 0.99193548 1.         1.         0.99196787 0.99193548
 0.99595142 1.         0.99593496 0.99596774]

mean value: 0.9959644374521054

key: test_recall
value: [1.         0.92592593 0.96428571 1.         0.96428571 0.96428571
 1.         0.96296296 1.         0.92592593]

mean value: 0.9707671957671957

key: train_recall
value: [0.99193548 0.99193548 0.99190283 0.99190283 1.         0.99595142
 0.99595142 0.99596774 0.98790323 0.99596774]

mean value: 0.9939418179443646

key: test_roc_auc
value: [1.         0.9284802  0.96428571 0.96428571 0.94642857 0.98214286
 0.98214286 0.9457672  1.         0.94510582]

mean value: 0.9658638934501004

key: train_roc_auc
value: [0.99399146 0.99201517 0.99595142 0.99595142 0.99606299 0.9940387
 0.9960072  0.99798387 0.99198311 0.99601537]

mean value: 0.9950000708407828

key: test_jcc
value: [1.         0.86206897 0.93103448 0.93333333 0.9        0.96428571
 0.96551724 0.89655172 1.         0.89285714]

mean value: 0.9345648604269294

key: train_jcc
value: [0.98795181 0.984      0.99190283 0.99190283 0.99196787 0.98795181
 0.99193548 0.99596774 0.98393574 0.99196787]

mean value: 0.9899483994224252

MCC on Blind test: 0.14

Accuracy on Blind test: 0.37

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.17011619 0.17630982 0.1871922  0.19212961 0.17157412 0.15598917
 0.15701985 0.17155218 0.17326117 0.15419507]

mean value: 0.17093393802642823

key: score_time
value: [0.02660775 0.02126074 0.02073812 0.01927352 0.02623224 0.02367377
 0.0206089  0.02606893 0.02499342 0.01983643]

mean value: 0.02292938232421875

key: test_mcc
value: [0.89342711 0.74984143 0.89342711 0.71428571 0.71428571 0.78571429
 0.68250015 0.78174603 0.72754449 0.81854376]

mean value: 0.7761315807091642

key: train_mcc
value: [0.83651026 0.85265474 0.84449262 0.84078809 0.85265708 0.84078809
 0.84078809 0.84907279 0.86501334 0.85318007]

mean value: 0.8475945177079469

key: test_accuracy
value: [0.94642857 0.875      0.94642857 0.85714286 0.85714286 0.89285714
 0.83928571 0.89090909 0.85454545 0.90909091]

mean value: 0.8868831168831168

key: train_accuracy
value: [0.91816367 0.9261477  0.92215569 0.92015968 0.9261477  0.92015968
 0.92015968 0.92430279 0.93227092 0.92629482]

mean value: 0.9235962338271664

key: test_fscore
value: [0.94545455 0.86792453 0.94736842 0.85714286 0.85714286 0.89285714
 0.84745763 0.88888889 0.86666667 0.90566038]

mean value: 0.8876563911984611

key: train_fscore
value: [0.91816367 0.92644135 0.92184369 0.92031873 0.9261477  0.92031873
 0.92031873 0.92460317 0.93253968 0.92673267]

mean value: 0.9237428122217914

key: test_precision
value: [0.92857143 0.88461538 0.93103448 0.85714286 0.85714286 0.89285714
 0.80645161 0.88888889 0.78787879 0.92307692]

mean value: 0.8757660365836116

key: train_precision
value: [0.90909091 0.91372549 0.91269841 0.90588235 0.91338583 0.90588235
 0.90588235 0.91015625 0.91796875 0.91050584]

mean value: 0.9105178534156458

key: test_recall
value: [0.96296296 0.85185185 0.96428571 0.85714286 0.85714286 0.89285714
 0.89285714 0.88888889 0.96296296 0.88888889]

mean value: 0.901984126984127

key: train_recall
value: [0.92741935 0.93951613 0.93117409 0.93522267 0.93927126 0.93522267
 0.93522267 0.93951613 0.94758065 0.94354839]

mean value: 0.9373694005485177

key: test_roc_auc
value: [0.94699872 0.87420179 0.94642857 0.85714286 0.85714286 0.89285714
 0.83928571 0.89087302 0.85648148 0.90873016]

mean value: 0.8870142309797482

key: train_roc_auc
value: [0.91825513 0.9262798  0.92227996 0.92036724 0.92632854 0.92036724
 0.92036724 0.92448247 0.93245174 0.9264986 ]

mean value: 0.9237677975946037

key: test_jcc
value: [0.89655172 0.76666667 0.9        0.75       0.75       0.80645161
 0.73529412 0.8        0.76470588 0.82758621]

mean value: 0.7997256210604375

key: train_jcc
value: [0.84870849 0.86296296 0.85501859 0.85239852 0.86245353 0.85239852
 0.85239852 0.8597786  0.87360595 0.86346863]

mean value: 0.8583192321390376

MCC on Blind test: 0.29

Accuracy on Blind test: 0.73

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.25510955 0.24359798 0.24252176 0.24263096 0.24308658 0.2433722
 0.24473262 0.24413776 0.24478555 0.24284172]

mean value: 0.2446816682815552

key: score_time
value: [0.00862026 0.00837231 0.00834084 0.00830126 0.00861955 0.00825286
 0.00839138 0.00829506 0.00852728 0.00854683]

mean value: 0.008426761627197266

key: test_mcc
value: [1.         0.9284802  0.92857143 0.93094934 0.85933785 0.96490128
 0.96490128 0.89153439 1.         0.8565805 ]

mean value: 0.9325256275611022

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.96428571 0.96428571 0.96428571 0.92857143 0.98214286
 0.98214286 0.94545455 1.         0.92727273]

mean value: 0.9658441558441558

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.96296296 0.96428571 0.96551724 0.93103448 0.98181818
 0.98245614 0.94545455 1.         0.92307692]

mean value: 0.9656606192087136

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96296296 0.96428571 0.93333333 0.9        1.
 0.96551724 0.92857143 1.         0.96      ]

mean value: 0.961467068053275

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96296296 0.96428571 1.         0.96428571 0.96428571
 1.         0.96296296 1.         0.88888889]

mean value: 0.9707671957671957

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9642401  0.96428571 0.96428571 0.92857143 0.98214286
 0.98214286 0.9457672  1.         0.9265873 ]

mean value: 0.9658023170954205

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.92857143 0.93103448 0.93333333 0.87096774 0.96428571
 0.96551724 0.89655172 1.         0.85714286]

mean value: 0.9347404523544679

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.3

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01189399 0.01387286 0.01450205 0.01408744 0.01389623 0.0166738
 0.01422548 0.01454496 0.01396155 0.01419568]

mean value: 0.014185404777526856

key: score_time
value: [0.01086521 0.01088333 0.01085353 0.01082253 0.01088142 0.01098228
 0.01087499 0.01154423 0.01157999 0.01079154]

mean value: 0.011007905006408691

key: test_mcc
value: [0.9284802  0.54871911 0.78772636 0.71611487 0.47187011 0.60753044
 0.68250015 0.60268595 0.81878307 0.67602163]

mean value: 0.6840431895998464

key: train_mcc
value: [0.80440606 0.81032473 0.79940894 0.79646944 0.70336606 0.78901365
 0.78773489 0.80486309 0.79845601 0.7610531 ]

mean value: 0.7855095967948068

key: test_accuracy
value: [0.96428571 0.76785714 0.89285714 0.85714286 0.73214286 0.80357143
 0.83928571 0.8        0.90909091 0.83636364]

mean value: 0.8402597402597403

key: train_accuracy
value: [0.90219561 0.90419162 0.89620758 0.89820359 0.83433134 0.89221557
 0.89221557 0.90039841 0.89840637 0.87848606]

mean value: 0.88968517148969

key: test_fscore
value: [0.96296296 0.72340426 0.88888889 0.85185185 0.70588235 0.80701754
 0.83018868 0.78431373 0.90909091 0.82352941]

mean value: 0.8287130581414772

key: train_fscore
value: [0.90060852 0.89958159 0.88695652 0.89570552 0.8        0.88412017
 0.88510638 0.89361702 0.89352818 0.86993603]

mean value: 0.8809159946199812

key: test_precision
value: [0.96296296 0.85       0.92307692 0.88461538 0.7826087  0.79310345
 0.88       0.83333333 0.89285714 0.875     ]

mean value: 0.8677557890773783

key: train_precision
value: [0.90612245 0.93478261 0.95774648 0.90495868 0.98809524 0.94063927
 0.93273543 0.94594595 0.92640693 0.92307692]

mean value: 0.9360509943174828

key: test_recall
value: [0.96296296 0.62962963 0.85714286 0.82142857 0.64285714 0.82142857
 0.78571429 0.74074074 0.92592593 0.77777778]

mean value: 0.7965608465608466

key: train_recall
value: [0.89516129 0.86693548 0.82591093 0.88663968 0.67206478 0.8340081
 0.84210526 0.84677419 0.86290323 0.82258065]

mean value: 0.8355083583648949

key: test_roc_auc
value: [0.9642401  0.76309068 0.89285714 0.85714286 0.73214286 0.80357143
 0.83928571 0.7989418  0.90939153 0.83531746]

mean value: 0.8395981572705711

key: train_roc_auc
value: [0.9021261  0.90382347 0.89523893 0.89804425 0.83209538 0.8914135
 0.89152507 0.89976505 0.89798705 0.87782576]

mean value: 0.8889844552398375

key: test_jcc
value: [0.92857143 0.56666667 0.8        0.74193548 0.54545455 0.67647059
 0.70967742 0.64516129 0.83333333 0.7       ]

mean value: 0.7147270755809655

key: train_jcc
value: [0.81918819 0.81749049 0.796875   0.81111111 0.66666667 0.79230769
 0.79389313 0.80769231 0.80754717 0.76981132]

mean value: 0.7882583084293304

MCC on Blind test: 0.32

Accuracy on Blind test: 0.72

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01984334 0.02952814 0.0294714  0.0295279  0.02948236 0.02966619
 0.02946544 0.02954245 0.02954078 0.02948689]

mean value: 0.02855548858642578

key: score_time
value: [0.01642299 0.01944399 0.02070069 0.01963282 0.01078868 0.02068901
 0.01969433 0.0106039  0.02023768 0.02003217]

mean value: 0.017824625968933104

key: test_mcc
value: [0.93103448 0.82149863 0.89342711 0.78772636 0.67900461 0.85933785
 0.71611487 0.78174603 0.71735629 0.81854376]

mean value: 0.8005790004186416

key: train_mcc
value: [0.82071187 0.83279667 0.82071472 0.80065667 0.82921429 0.83720268
 0.82507217 0.81310081 0.82516195 0.81719167]

mean value: 0.822182349120051

key: test_accuracy
value: [0.96428571 0.91071429 0.94642857 0.89285714 0.83928571 0.92857143
 0.85714286 0.89090909 0.85454545 0.90909091]

mean value: 0.8993831168831169

key: train_accuracy
value: [0.91017964 0.91616766 0.91017964 0.9001996  0.91417166 0.91816367
 0.91217565 0.9063745  0.9123506  0.90836653]

mean value: 0.9108329158416235

key: test_fscore
value: [0.96428571 0.90566038 0.94736842 0.89655172 0.84210526 0.92592593
 0.86206897 0.88888889 0.86206897 0.90566038]

mean value: 0.900058462320045

key: train_fscore
value: [0.91053678 0.91666667 0.91017964 0.9        0.91485149 0.91881188
 0.91269841 0.90656064 0.91269841 0.90873016]

mean value: 0.9111734073355806

key: test_precision
value: [0.93103448 0.92307692 0.93103448 0.86666667 0.82758621 0.96153846
 0.83333333 0.88888889 0.80645161 0.92307692]

mean value: 0.8892687981898215

key: train_precision
value: [0.89803922 0.90234375 0.8976378  0.88932806 0.89534884 0.89922481
 0.89494163 0.89411765 0.8984375  0.89453125]

mean value: 0.8963950498913893

key: test_recall
value: [1.         0.88888889 0.96428571 0.92857143 0.85714286 0.89285714
 0.89285714 0.88888889 0.92592593 0.88888889]

mean value: 0.9128306878306878

key: train_recall
value: [0.9233871  0.93145161 0.92307692 0.91093117 0.93522267 0.93927126
 0.93117409 0.91935484 0.92741935 0.9233871 ]

mean value: 0.9264676113360324

key: test_roc_auc
value: [0.96551724 0.90996169 0.94642857 0.89285714 0.83928571 0.92857143
 0.85714286 0.89087302 0.85582011 0.90873016]

mean value: 0.899518792191206

key: train_roc_auc
value: [0.91031015 0.91631869 0.91035736 0.90034748 0.91446173 0.91845453
 0.91243744 0.90652781 0.91252858 0.90854394]

mean value: 0.9110287700326485

key: test_jcc
value: [0.93103448 0.82758621 0.9        0.8125     0.72727273 0.86206897
 0.75757576 0.8        0.75757576 0.82758621]

mean value: 0.8203200104493208

key: train_jcc
value: [0.83576642 0.84615385 0.83516484 0.81818182 0.84306569 0.84981685
 0.83941606 0.82909091 0.83941606 0.83272727]

mean value: 0.8368799764712174

MCC on Blind test: 0.25

Accuracy on Blind test: 0.71

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.19355822 0.24611878 0.21725583 0.18385339 0.20384955 0.20188379
 0.19162393 0.19133949 0.19330359 0.22055078]

mean value: 0.2043337345123291

key: score_time
value: [0.02099872 0.01083326 0.02049541 0.02122831 0.02145958 0.01939106
 0.01959395 0.02021599 0.01877356 0.0112946 ]

mean value: 0.018428444862365723

key: test_mcc
value: [0.93103448 0.82149863 0.89342711 0.78772636 0.67900461 0.89342711
 0.67900461 0.78174603 0.71735629 0.8565805 ]

mean value: 0.8040805738070792

key: train_mcc
value: [0.84902508 0.84902508 0.84856792 0.80065667 0.86474639 0.86116786
 0.85289102 0.8493299  0.83338631 0.84549238]

mean value: 0.8454288618434195

key: test_accuracy
value: [0.96428571 0.91071429 0.94642857 0.89285714 0.83928571 0.94642857
 0.83928571 0.89090909 0.85454545 0.92727273]

mean value: 0.9012012987012987

key: train_accuracy
value: [0.9241517  0.9241517  0.9241517  0.9001996  0.93213573 0.93013972
 0.9261477  0.92430279 0.91633466 0.92231076]

mean value: 0.9224026051482692

key: test_fscore
value: [0.96428571 0.90566038 0.94736842 0.89655172 0.84210526 0.94545455
 0.84210526 0.88888889 0.86206897 0.92307692]

mean value: 0.9017566086088156

key: train_fscore
value: [0.92490119 0.92490119 0.924      0.9        0.93227092 0.93069307
 0.92644135 0.92490119 0.91699605 0.92307692]

mean value: 0.9228181865350266

key: test_precision
value: [0.93103448 0.92307692 0.93103448 0.86666667 0.82758621 0.96296296
 0.82758621 0.88888889 0.80645161 0.96      ]

mean value: 0.8925288433809012

key: train_precision
value: [0.90697674 0.90697674 0.91304348 0.88932806 0.91764706 0.91085271
 0.91015625 0.90697674 0.89922481 0.9034749 ]

mean value: 0.9064657505738394

key: test_recall
value: [1.         0.88888889 0.96428571 0.92857143 0.85714286 0.92857143
 0.85714286 0.88888889 0.92592593 0.88888889]

mean value: 0.9128306878306878

key: train_recall
value: [0.94354839 0.94354839 0.93522267 0.91093117 0.94736842 0.951417
 0.94331984 0.94354839 0.93548387 0.94354839]

mean value: 0.939793652866658

key: test_roc_auc
value: [0.96551724 0.90996169 0.94642857 0.89285714 0.83928571 0.94642857
 0.83928571 0.89087302 0.85582011 0.9265873 ]

mean value: 0.9013045064769203

key: train_roc_auc
value: [0.92434336 0.92434336 0.92430425 0.90034748 0.93234563 0.93043291
 0.92638433 0.9245301  0.91656083 0.9225616 ]

mean value: 0.9226153848348726

key: test_jcc
value: [0.93103448 0.82758621 0.9        0.8125     0.72727273 0.89655172
 0.72727273 0.8        0.75757576 0.85714286]

mean value: 0.8236936483057172

key: train_jcc
value: [0.86029412 0.86029412 0.85873606 0.81818182 0.87313433 0.87037037
 0.86296296 0.86029412 0.84671533 0.85714286]

mean value: 0.8568126077904101

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02480316 0.04043341 0.03790212 0.02258253 0.02496028 0.02353382
 0.02346492 0.02378702 0.02321482 0.02300453]

mean value: 0.02676866054534912

key: score_time
value: [0.01101851 0.01310444 0.01055646 0.01048899 0.01055861 0.01072121
 0.01049709 0.01050425 0.01051044 0.01050949]

mean value: 0.010846948623657227

key: test_mcc
value: [0.8953202  0.8953202  0.82512315 0.79110556 0.71611487 0.89342711
 0.75434227 0.75047877 0.68250015 0.82195294]

mean value: 0.8025685230193058

key: train_mcc
value: [0.82263766 0.83068165 0.82666897 0.83070006 0.83890131 0.81930411
 0.83123063 0.8387452  0.83529327 0.81527029]

mean value: 0.8289433160428895

key: test_accuracy
value: [0.94736842 0.94736842 0.9122807  0.89473684 0.85714286 0.94642857
 0.875      0.875      0.83928571 0.91071429]

mean value: 0.900532581453634

key: train_accuracy
value: [0.9112426  0.91518738 0.91321499 0.91518738 0.91929134 0.90944882
 0.91535433 0.91929134 0.91732283 0.90748031]

mean value: 0.9143021323517992

key: test_fscore
value: [0.94736842 0.94736842 0.9122807  0.9        0.86206897 0.94545455
 0.88135593 0.87272727 0.84745763 0.90909091]

mean value: 0.9025172795971652

key: train_fscore
value: [0.9122807  0.91650485 0.9140625  0.91617934 0.92038835 0.91085271
 0.91682785 0.92007797 0.91891892 0.90873786]

mean value: 0.9154831064752351

key: test_precision
value: [0.93103448 0.93103448 0.92857143 0.87096774 0.83333333 0.96296296
 0.83870968 0.88888889 0.80645161 0.92592593]

mean value: 0.8917880537457845

key: train_precision
value: [0.9034749  0.90421456 0.9034749  0.90384615 0.90804598 0.89694656
 0.90114068 0.91119691 0.90151515 0.89655172]

mean value: 0.9030407533340564

key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.92857143
 0.92857143 0.85714286 0.89285714 0.89285714]

mean value: 0.9149014778325123

key: train_recall
value: [0.92125984 0.92913386 0.92490119 0.92885375 0.93307087 0.92519685
 0.93307087 0.92913386 0.93700787 0.92125984]

mean value: 0.9282888798979179

key: test_roc_auc
value: [0.9476601  0.9476601  0.91256158 0.89408867 0.85714286 0.94642857
 0.875      0.875      0.83928571 0.91071429]

mean value: 0.9005541871921182

key: train_roc_auc
value: [0.91122281 0.91515981 0.91323799 0.91521428 0.91929134 0.90944882
 0.91535433 0.91929134 0.91732283 0.90748031]

mean value: 0.914302387102798

key: test_jcc
value: [0.9        0.9        0.83870968 0.81818182 0.75757576 0.89655172
 0.78787879 0.77419355 0.73529412 0.83333333]

mean value: 0.8241718764561139

key: train_jcc
value: [0.83870968 0.84587814 0.84172662 0.84532374 0.85251799 0.83629893
 0.84642857 0.85198556 0.85       0.83274021]

mean value: 0.8441609435846644

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.66862154 0.89965343 0.87873912 0.72013283 0.79384041 0.85204673
 0.71222496 0.79153037 0.77842331 0.70308733]

mean value: 0.779830002784729

key: score_time
value: [0.01159906 0.02059031 0.01229548 0.01344728 0.01333761 0.01349497
 0.01231074 0.0122211  0.0130713  0.01268458]

mean value: 0.013505244255065918

key: test_mcc
value: [0.93202124 0.92980296 0.92980296 0.85960591 0.78772636 1.
 0.85933785 0.85714286 0.78772636 0.78772636]

mean value: 0.8730892854406824

key: train_mcc
value: [0.93294638 0.93691352 0.94480151 0.93691156 0.93703692 0.93703692
 0.92520402 0.9332517  0.92520402 0.94095217]

mean value: 0.9350258732625361

key: test_accuracy
value: [0.96491228 0.96491228 0.96491228 0.92982456 0.89285714 1.
 0.92857143 0.92857143 0.89285714 0.89285714]

mean value: 0.9360275689223058

key: train_accuracy
value: [0.96646943 0.96844181 0.97238659 0.96844181 0.96850394 0.96850394
 0.96259843 0.96653543 0.96259843 0.97047244]

mean value: 0.967495224339561

key: test_fscore
value: [0.96296296 0.96428571 0.96551724 0.93103448 0.89655172 1.
 0.93103448 0.92857143 0.89655172 0.88888889]

mean value: 0.9365398649881409

key: train_fscore
value: [0.96646943 0.96837945 0.97222222 0.96825397 0.96837945 0.96837945
 0.96267191 0.96620278 0.96267191 0.9704142 ]

mean value: 0.9674044754283551

key: test_precision
value: [1.         0.96428571 0.96551724 0.93103448 0.86666667 1.
 0.9        0.92857143 0.86666667 0.92307692]

mean value: 0.934581912340533

key: train_precision
value: [0.96837945 0.97222222 0.97609562 0.97211155 0.97222222 0.97222222
 0.96078431 0.97590361 0.96078431 0.97233202]

mean value: 0.9703057542340813

key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.92857143 1.
 0.96428571 0.92857143 0.92857143 0.85714286]

mean value: 0.9396551724137931

key: train_recall
value: [0.96456693 0.96456693 0.96837945 0.96442688 0.96456693 0.96456693
 0.96456693 0.95669291 0.96456693 0.96850394]

mean value: 0.9645404749307522

key: test_roc_auc
value: [0.96428571 0.96490148 0.96490148 0.92980296 0.89285714 1.
 0.92857143 0.92857143 0.89285714 0.89285714]

mean value: 0.935960591133005

key: train_roc_auc
value: [0.96647319 0.96844947 0.9723787  0.96843391 0.96850394 0.96850394
 0.96259843 0.96653543 0.96259843 0.97047244]

mean value: 0.9674947869658586

key: test_jcc
value: [0.92857143 0.93103448 0.93333333 0.87096774 0.8125     1.
 0.87096774 0.86666667 0.8125     0.8       ]

mean value: 0.8826541395201017

key: train_jcc
value: [0.9351145  0.93869732 0.94594595 0.93846154 0.93869732 0.93869732
 0.9280303  0.93461538 0.9280303  0.94252874]

mean value: 0.9368818668555441

MCC on Blind test: 0.23

Accuracy on Blind test: 0.65

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01145101 0.01084852 0.0087235  0.0084331  0.0084486  0.00842762
 0.00777602 0.00863409 0.00846457 0.00798273]

mean value: 0.008918976783752442

key: score_time
value: [0.01097488 0.00906825 0.00876808 0.00882697 0.0086298  0.00885534
 0.00874305 0.00842094 0.00866842 0.00866818]

mean value: 0.008962392807006836

key: test_mcc
value: [0.77728159 0.68736396 0.77903565 0.56277738 0.43876345 0.49030429
 0.75434227 0.65814518 0.73127242 0.65814518]

mean value: 0.6537431378840208

key: train_mcc
value: [0.65218808 0.64992518 0.66460838 0.66501403 0.62068788 0.66768511
 0.6527166  0.71796573 0.66658604 0.66539291]

mean value: 0.66227699222193

key: test_accuracy
value: [0.87719298 0.84210526 0.87719298 0.77192982 0.71428571 0.73214286
 0.875      0.82142857 0.85714286 0.82142857]

mean value: 0.818984962406015

key: train_accuracy
value: [0.81854043 0.81656805 0.82445759 0.82248521 0.79330709 0.82677165
 0.81889764 0.85629921 0.82677165 0.82480315]

mean value: 0.8228901675752069

key: test_fscore
value: [0.85714286 0.83018868 0.8627451  0.74509804 0.68       0.68085106
 0.86792453 0.8        0.84       0.8       ]

mean value: 0.7963950265774716

key: train_fscore
value: [0.79735683 0.79379157 0.80266075 0.7972973  0.75294118 0.80701754
 0.79735683 0.84696017 0.80786026 0.80353201]

mean value: 0.8006774440728486

key: test_precision
value: [1.         0.88       1.         0.86363636 0.77272727 0.84210526
 0.92       0.90909091 0.95454545 0.90909091]

mean value: 0.9051196172248803

key: train_precision
value: [0.905      0.90862944 0.91414141 0.92670157 0.93567251 0.91089109
 0.905      0.9058296  0.90686275 0.91457286]

mean value: 0.9133301236007405

key: test_recall
value: [0.75       0.78571429 0.75862069 0.65517241 0.60714286 0.57142857
 0.82142857 0.71428571 0.75       0.71428571]

mean value: 0.712807881773399

key: train_recall
value: [0.71259843 0.70472441 0.71541502 0.69960474 0.62992126 0.72440945
 0.71259843 0.79527559 0.72834646 0.71653543]

mean value: 0.7139429211664747

key: test_roc_auc
value: [0.875      0.841133   0.87931034 0.77401478 0.71428571 0.73214286
 0.875      0.82142857 0.85714286 0.82142857]

mean value: 0.8190886699507389

key: train_roc_auc
value: [0.81874981 0.81678908 0.82424294 0.82224332 0.79330709 0.82677165
 0.81889764 0.85629921 0.82677165 0.82480315]

mean value: 0.8228875540755034

key: test_jcc
value: [0.75       0.70967742 0.75862069 0.59375    0.51515152 0.51612903
 0.76666667 0.66666667 0.72413793 0.66666667]

mean value: 0.6667466587454074

key: train_jcc
value: [0.66300366 0.65808824 0.67037037 0.66292135 0.60377358 0.67647059
 0.66300366 0.73454545 0.67765568 0.67158672]

mean value: 0.6681419301195666

MCC on Blind test: 0.34

Accuracy on Blind test: 0.78

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00911832 0.00889468 0.00884533 0.00869632 0.00868821 0.00798774
 0.00814915 0.0086484  0.0087316  0.00859666]

mean value: 0.008635640144348145

key: score_time
value: [0.00931168 0.00890446 0.00889421 0.00877428 0.00848413 0.00837326
 0.00868559 0.00825214 0.00880623 0.00885201]

mean value: 0.008733797073364257

key: test_mcc
value: [0.8953202  0.82512315 0.85960591 0.71921182 0.71611487 0.75047877
 0.67900461 0.75047877 0.64450339 0.82195294]

mean value: 0.766179444196459

key: train_mcc
value: [0.76340037 0.76340037 0.76353762 0.75544282 0.77564465 0.77588525
 0.77564465 0.77991449 0.79149195 0.76800824]

mean value: 0.7712370421013379

key: test_accuracy
value: [0.94736842 0.9122807  0.92982456 0.85964912 0.85714286 0.875
 0.83928571 0.875      0.82142857 0.91071429]

mean value: 0.8827694235588972

key: train_accuracy
value: [0.8816568  0.8816568  0.8816568  0.87771203 0.88779528 0.88779528
 0.88779528 0.88976378 0.89566929 0.88385827]

mean value: 0.8855359611113699

key: test_fscore
value: [0.94736842 0.9122807  0.93103448 0.86206897 0.86206897 0.87272727
 0.84210526 0.87272727 0.82758621 0.90909091]

mean value: 0.8839058461200022

key: train_fscore
value: [0.8828125  0.8828125  0.8828125  0.87698413 0.88845401 0.88932039
 0.88845401 0.89147287 0.89668616 0.88543689]

mean value: 0.8865245960082

key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.86206897 0.83333333 0.88888889
 0.82758621 0.88888889 0.8        0.92592593]

mean value: 0.8785312899106003

key: train_precision
value: [0.87596899 0.87596899 0.87258687 0.88047809 0.88326848 0.87739464
 0.88326848 0.8778626  0.88803089 0.87356322]

mean value: 0.8788391247569809

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.86206897 0.89285714 0.85714286
 0.85714286 0.85714286 0.85714286 0.89285714]

mean value: 0.8900246305418719

key: train_recall
value: [0.88976378 0.88976378 0.89328063 0.87351779 0.89370079 0.9015748
 0.89370079 0.90551181 0.90551181 0.8976378 ]

mean value: 0.8943963773303041

key: test_roc_auc
value: [0.9476601  0.91256158 0.92980296 0.85960591 0.85714286 0.875
 0.83928571 0.875      0.82142857 0.91071429]

mean value: 0.882820197044335

key: train_roc_auc
value: [0.88164078 0.88164078 0.88167969 0.87770378 0.88779528 0.88779528
 0.88779528 0.88976378 0.89566929 0.88385827]

mean value: 0.8855342192897825

key: test_jcc
value: [0.9        0.83870968 0.87096774 0.75757576 0.75757576 0.77419355
 0.72727273 0.77419355 0.70588235 0.83333333]

mean value: 0.7939704444827784

key: train_jcc
value: [0.79020979 0.79020979 0.79020979 0.78091873 0.79929577 0.8006993
 0.79929577 0.8041958  0.81272085 0.79442509]

mean value: 0.7962180687899996

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00799131 0.0081377  0.00847793 0.00799417 0.00966215 0.01012588
 0.00796223 0.00843406 0.00802898 0.00803018]

mean value: 0.008484458923339844

key: score_time
value: [0.01276636 0.0125227  0.01176548 0.0133183  0.01890969 0.01576591
 0.01556277 0.01163697 0.01146603 0.01169634]

mean value: 0.013541054725646973

key: test_mcc
value: [0.8953202  0.78940887 0.71921182 0.79110556 0.75047877 0.68250015
 0.60753044 0.75047877 0.58501794 0.82195294]

mean value: 0.7393005465274064

key: train_mcc
value: [0.78707279 0.78304441 0.77919572 0.79093074 0.79951627 0.78742599
 0.80317451 0.80759374 0.79936749 0.78395685]

mean value: 0.79212785011907

key: test_accuracy
value: [0.94736842 0.89473684 0.85964912 0.89473684 0.875      0.83928571
 0.80357143 0.875      0.78571429 0.91071429]

mean value: 0.868577694235589

key: train_accuracy
value: [0.89349112 0.89151874 0.88954635 0.89546351 0.8996063  0.89370079
 0.9015748  0.90354331 0.8996063  0.89173228]

mean value: 0.8959783503393437

key: test_fscore
value: [0.94736842 0.89285714 0.86206897 0.9        0.87719298 0.83018868
 0.80701754 0.87272727 0.80645161 0.90909091]

mean value: 0.8704963529709496

key: train_fscore
value: [0.89453125 0.89151874 0.89019608 0.8950495  0.90097087 0.89411765
 0.90196078 0.90522244 0.9005848  0.89361702]

mean value: 0.8967769129948973

key: test_precision
value: [0.93103448 0.89285714 0.86206897 0.87096774 0.86206897 0.88
 0.79310345 0.88888889 0.73529412 0.92592593]

mean value: 0.8642209679323466

key: train_precision
value: [0.8875969  0.89328063 0.88326848 0.8968254  0.88888889 0.890625
 0.8984375  0.88973384 0.89189189 0.878327  ]

mean value: 0.8898875528234225

key: test_recall
value: [0.96428571 0.89285714 0.86206897 0.93103448 0.89285714 0.78571429
 0.82142857 0.85714286 0.89285714 0.89285714]

mean value: 0.8793103448275862

key: train_recall
value: [0.9015748  0.88976378 0.8972332  0.89328063 0.91338583 0.8976378
 0.90551181 0.92125984 0.90944882 0.90944882]

mean value: 0.9038545330055087

key: test_roc_auc
value: [0.9476601  0.89470443 0.85960591 0.89408867 0.875      0.83928571
 0.80357143 0.875      0.78571429 0.91071429]

mean value: 0.8685344827586207

key: train_roc_auc
value: [0.89347515 0.89152221 0.88956148 0.89545921 0.8996063  0.89370079
 0.9015748  0.90354331 0.8996063  0.89173228]

mean value: 0.8959781830630855

key: test_jcc
value: [0.9        0.80645161 0.75757576 0.81818182 0.78125    0.70967742
 0.67647059 0.77419355 0.67567568 0.83333333]

mean value: 0.7732809753647041

key: train_jcc
value: [0.80918728 0.80427046 0.80212014 0.81003584 0.81978799 0.80851064
 0.82142857 0.82685512 0.81914894 0.80769231]

mean value: 0.8129037288551658

MCC on Blind test: 0.25

Accuracy on Blind test: 0.72

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01498389 0.01484656 0.01499653 0.01508522 0.01529217 0.01519728
 0.0148685  0.01477909 0.01449656 0.01571655]

mean value: 0.015026235580444336

key: score_time
value: [0.00945044 0.00991273 0.00937939 0.00918961 0.0093348  0.00939512
 0.00931787 0.00938892 0.0092535  0.00957513]

mean value: 0.009419751167297364

key: test_mcc
value: [0.8953202  0.8953202  0.85960591 0.75462449 0.71611487 0.78772636
 0.67900461 0.75047877 0.64450339 0.78772636]

mean value: 0.7770425163515529

key: train_mcc
value: [0.77929987 0.77929987 0.78334713 0.79108822 0.79936749 0.78779242
 0.80337378 0.79567034 0.80324922 0.77974514]

mean value: 0.7902233465851163

key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.87719298 0.85714286 0.89285714
 0.83928571 0.875      0.82142857 0.89285714]

mean value: 0.8880325814536341

key: train_accuracy
value: [0.88954635 0.88954635 0.89151874 0.89546351 0.8996063  0.89370079
 0.9015748  0.8976378  0.9015748  0.88976378]

mean value: 0.894993321840687

key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.88135593 0.86206897 0.88888889
 0.84210526 0.87272727 0.82758621 0.88888889]

mean value: 0.8889392743144012

key: train_fscore
value: [0.89105058 0.89105058 0.89278752 0.8962818  0.9005848  0.89534884
 0.90272374 0.89922481 0.90234375 0.89105058]

mean value: 0.8962446999871675

key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.86666667 0.83333333 0.92307692
 0.82758621 0.88888889 0.8        0.92307692]

mean value: 0.8855732390215149

key: train_precision
value: [0.88076923 0.88076923 0.88076923 0.8875969  0.89189189 0.88167939
 0.89230769 0.88549618 0.89534884 0.88076923]

mean value: 0.88573978162297

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
 0.85714286 0.85714286 0.85714286 0.85714286]

mean value: 0.8934729064039408

key: train_recall
value: [0.9015748  0.9015748  0.90513834 0.90513834 0.90944882 0.90944882
 0.91338583 0.91338583 0.90944882 0.9015748 ]

mean value: 0.9070119199526937

key: test_roc_auc
value: [0.9476601  0.9476601  0.92980296 0.87684729 0.85714286 0.89285714
 0.83928571 0.875      0.82142857 0.89285714]

mean value: 0.8880541871921183

key: train_roc_auc
value: [0.88952258 0.88952258 0.89154555 0.89548256 0.8996063  0.89370079
 0.9015748  0.8976378  0.9015748  0.88976378]

mean value: 0.8949931530297843

key: test_jcc
value: [0.9        0.9        0.87096774 0.78787879 0.75757576 0.8
 0.72727273 0.77419355 0.70588235 0.8       ]

mean value: 0.802377091599103

key: train_jcc
value: [0.80350877 0.80350877 0.80633803 0.81205674 0.81914894 0.81052632
 0.82269504 0.81690141 0.82206406 0.80350877]

mean value: 0.8120256834358026

MCC on Blind test: 0.22

Accuracy on Blind test: 0.71

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.55003285 1.53659248 1.37875795 1.49586344 1.57297134 1.42394137
 1.5258956  1.60407376 1.39470887 1.90794826]

mean value: 1.5390785932540894

key: score_time
value: [0.01411986 0.01391315 0.01411891 0.01399922 0.01386738 0.01426673
 0.01415634 0.01177144 0.01455188 0.01436853]

mean value: 0.013913345336914063

key: test_mcc
value: [0.8951918  0.92980296 0.82490815 0.8953202  0.75047877 0.89802651
 0.89342711 0.78772636 0.78772636 0.85714286]

mean value: 0.851975108089572

key: train_mcc
value: [0.97245522 0.96055211 0.97239383 0.96055211 0.97250878 0.9645744
 0.96463421 0.96850394 0.9645744  0.9645744 ]

mean value: 0.9665323428042959

key: test_accuracy
value: [0.94736842 0.96491228 0.9122807  0.94736842 0.875      0.94642857
 0.94642857 0.89285714 0.89285714 0.92857143]

mean value: 0.9254072681704261

key: train_accuracy
value: [0.98619329 0.98027613 0.98619329 0.98027613 0.98622047 0.98228346
 0.98228346 0.98425197 0.98228346 0.98228346]

mean value: 0.9832545155228378

key: test_fscore
value: [0.94545455 0.96428571 0.91525424 0.94736842 0.87719298 0.94339623
 0.94736842 0.88888889 0.89655172 0.92857143]

mean value: 0.9254332589603143

key: train_fscore
value: [0.98613861 0.98031496 0.98613861 0.98023715 0.98613861 0.98224852
 0.98217822 0.98425197 0.98224852 0.98231827]

mean value: 0.9832213455229958

key: test_precision
value: [0.96296296 0.96428571 0.9        0.96428571 0.86206897 1.
 0.93103448 0.92307692 0.86666667 0.92857143]

mean value: 0.9302952858125272

key: train_precision
value: [0.99203187 0.98031496 0.98809524 0.98023715 0.99203187 0.98418972
 0.98804781 0.98425197 0.98418972 0.98039216]

mean value: 0.9853782478667216

key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
 0.96428571 0.85714286 0.92857143 0.92857143]

mean value: 0.9219211822660098

key: train_recall
value: [0.98031496 0.98031496 0.98418972 0.98023715 0.98031496 0.98031496
 0.97637795 0.98425197 0.98031496 0.98425197]

mean value: 0.9810883570383742

key: test_roc_auc
value: [0.94704433 0.96490148 0.91194581 0.9476601  0.875      0.94642857
 0.94642857 0.89285714 0.89285714 0.92857143]

mean value: 0.9253694581280789

key: train_roc_auc
value: [0.98620491 0.98027606 0.98618935 0.98027606 0.98622047 0.98228346
 0.98228346 0.98425197 0.98228346 0.98228346]

mean value: 0.9832552674986773

key: test_jcc
value: [0.89655172 0.93103448 0.84375    0.9        0.78125    0.89285714
 0.9        0.8        0.8125     0.86666667]

mean value: 0.8624610016420361

key: train_jcc
value: [0.97265625 0.96138996 0.97265625 0.96124031 0.97265625 0.96511628
 0.96498054 0.96899225 0.96511628 0.96525097]

mean value: 0.9670055337667078

MCC on Blind test: 0.26

Accuracy on Blind test: 0.66

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01483965 0.0120194  0.01096416 0.01053548 0.01014924 0.01097488
 0.01111197 0.01054001 0.01057744 0.01146078]

mean value: 0.011317300796508788

key: score_time
value: [0.01083517 0.00851774 0.00850797 0.00900292 0.00824666 0.00808549
 0.0081172  0.00822592 0.00818729 0.00816345]

mean value: 0.00858898162841797

key: test_mcc
value: [0.93202124 0.8951918  0.85960591 0.8953202  0.75434227 0.96490128
 0.75434227 0.89342711 0.96490128 0.92857143]

mean value: 0.8842624793067261

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.94736842 0.92982456 0.94736842 0.875      0.98214286
 0.875      0.94642857 0.98214286 0.96428571]

mean value: 0.9414473684210526

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96296296 0.94545455 0.93103448 0.94736842 0.88135593 0.98181818
 0.88135593 0.94736842 0.98181818 0.96428571]

mean value: 0.942482277561025

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96296296 0.93103448 0.96428571 0.83870968 1.
 0.83870968 0.93103448 1.         0.96428571]

mean value: 0.9431022711890342

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.92857143 0.92857143 0.93103448 0.93103448 0.92857143 0.96428571
 0.92857143 0.96428571 0.96428571 0.96428571]

mean value: 0.9433497536945813

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96428571 0.94704433 0.92980296 0.9476601  0.875      0.98214286
 0.875      0.94642857 0.98214286 0.96428571]

mean value: 0.9413793103448276

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.92857143 0.89655172 0.87096774 0.9        0.78787879 0.96428571
 0.78787879 0.9        0.96428571 0.93103448]

mean value: 0.8931454381732469

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.36

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10266161 0.12123942 0.11855769 0.10826373 0.11115503 0.13033295
 0.11559772 0.10858154 0.10145831 0.10418749]

mean value: 0.11220355033874511

key: score_time
value: [0.01758289 0.02243209 0.02058554 0.02079964 0.02117038 0.01818752
 0.01786613 0.01750755 0.01726437 0.01872659]

mean value: 0.01921226978302002

key: test_mcc
value: [0.92980296 0.86189955 0.85960591 0.82490815 0.85714286 0.89342711
 0.92857143 0.82195294 0.78571429 0.92857143]

mean value: 0.8691596616885752

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.92982456 0.92982456 0.9122807  0.92857143 0.94642857
 0.96428571 0.91071429 0.89285714 0.96428571]

mean value: 0.9343984962406016

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.93103448 0.93103448 0.91525424 0.92857143 0.94736842
 0.96428571 0.90909091 0.89285714 0.96428571]

mean value: 0.9348068247234632

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.9        0.93103448 0.9        0.92857143 0.93103448
 0.96428571 0.92592593 0.89285714 0.96428571]

mean value: 0.9302280605728882

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 0.96428571
 0.96428571 0.89285714 0.89285714 0.96428571]

mean value: 0.9397783251231527

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96490148 0.93041872 0.92980296 0.91194581 0.92857143 0.94642857
 0.96428571 0.91071429 0.89285714 0.96428571]

mean value: 0.9344211822660099

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.87096774 0.87096774 0.84375    0.86666667 0.9
 0.93103448 0.83333333 0.80645161 0.93103448]

mean value: 0.8785240545050056

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.32

Accuracy on Blind test: 0.71

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00810385 0.00779867 0.00782776 0.0077281  0.00779986 0.00821972
 0.00770831 0.0078783  0.00787997 0.00846457]

mean value: 0.007940912246704101

key: score_time
value: [0.00810742 0.00859261 0.00797725 0.00838828 0.00801611 0.0083127
 0.00793481 0.00823236 0.00814414 0.00799465]

mean value: 0.008170032501220703

key: test_mcc
value: [0.8951918  0.78940887 0.68472906 0.8615634  0.5118907  0.65814518
 0.89342711 0.75434227 0.85933785 0.92857143]

mean value: 0.7836607672751627

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.89473684 0.84210526 0.92982456 0.75       0.82142857
 0.94642857 0.875      0.92857143 0.96428571]

mean value: 0.8899749373433584

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94545455 0.89285714 0.84210526 0.93333333 0.77419355 0.8
 0.94545455 0.86792453 0.92592593 0.96428571]

mean value: 0.8891534547158085

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96296296 0.89285714 0.85714286 0.90322581 0.70588235 0.90909091
 0.96296296 0.92       0.96153846 0.96428571]

mean value: 0.90399491702338

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.92857143 0.89285714 0.82758621 0.96551724 0.85714286 0.71428571
 0.92857143 0.82142857 0.89285714 0.96428571]

mean value: 0.8793103448275862

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94704433 0.89470443 0.84236453 0.92918719 0.75       0.82142857
 0.94642857 0.875      0.92857143 0.96428571]

mean value: 0.8899014778325124

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.89655172 0.80645161 0.72727273 0.875      0.63157895 0.66666667
 0.89655172 0.76666667 0.86206897 0.93103448]

mean value: 0.8059843517429431

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.32

Accuracy on Blind test: 0.8

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.33799982 1.31312704 1.34472799 1.3418479  1.33652711 1.34488511
 1.34392357 1.36048555 1.3535583  1.39411759]

mean value: 1.3471199989318847

key: score_time
value: [0.09937048 0.09200263 0.0983386  0.09501576 0.09319186 0.09341598
 0.09465718 0.09849429 0.09647918 0.09067464]

mean value: 0.09516406059265137

key: test_mcc
value: [0.96547546 0.8953202  0.92980296 0.8951918  0.85933785 1.
 0.92857143 0.89342711 0.93094934 0.92857143]

mean value: 0.922664756643307

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.92857143 1.
 0.96428571 0.94642857 0.96428571 0.96428571]

mean value: 0.9609962406015038

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.94915254 0.93103448 1.
 0.96428571 0.94736842 0.96296296 0.96428571]

mean value: 0.961379368196865

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.93103448 0.96551724 0.93333333 0.9        1.
 0.96428571 0.93103448 1.         0.96428571]

mean value: 0.9589490968801314

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.96428571 1.
 0.96428571 0.96428571 0.92857143 0.96428571]

mean value: 0.9645320197044335

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98214286 0.9476601  0.96490148 0.94704433 0.92857143 1.
 0.96428571 0.94642857 0.96428571 0.96428571]

mean value: 0.960960591133005

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.9        0.93333333 0.90322581 0.87096774 1.
 0.93103448 0.9        0.92857143 0.93103448]

mean value: 0.9262452990094814

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.19

Accuracy on Blind test: 0.49

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.92916346 0.96192503 0.91043425 0.91636205 0.90995216 0.86325049
 0.90527892 0.94523883 0.87693882 0.92262363]

mean value: 0.9141167640686035

key: score_time
value: [0.27692175 0.23238063 0.2464447  0.29479647 0.18034601 0.23226142
 0.23399925 0.25210285 0.25105858 0.17118669]

mean value: 0.23714983463287354

key: test_mcc
value: [0.96547546 0.8953202  0.92980296 0.8951918  0.85714286 1.
 0.92857143 0.89342711 0.93094934 0.92857143]

mean value: 0.9224452574728608

key: train_mcc
value: [0.94503515 0.95277969 0.94878539 0.95278262 0.95687833 0.94112724
 0.94888508 0.95278544 0.94499908 0.94900279]

mean value: 0.9493060812767512

key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.92857143 1.
 0.96428571 0.94642857 0.96428571 0.96428571]

mean value: 0.9609962406015038

key: train_accuracy
value: [0.97238659 0.97633136 0.97435897 0.97633136 0.97834646 0.97047244
 0.97440945 0.97637795 0.97244094 0.97440945]

mean value: 0.9745864976937054

key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.94915254 0.92857143 1.
 0.96428571 0.94736842 0.96296296 0.96428571]

mean value: 0.9611330627781457

key: train_fscore
value: [0.97276265 0.9765625  0.97445972 0.97647059 0.9785575  0.97076023
 0.97455969 0.97647059 0.97265625 0.97465887]

mean value: 0.9747918592411458

key: test_precision
value: [1.         0.93103448 0.96551724 0.93333333 0.92857143 1.
 0.96428571 0.93103448 1.         0.96428571]

mean value: 0.9618062397372742

key: train_precision
value: [0.96153846 0.96899225 0.96875    0.9688716  0.96911197 0.96138996
 0.9688716  0.97265625 0.96511628 0.96525097]

mean value: 0.9670549325084619

key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.92857143 1.
 0.96428571 0.96428571 0.92857143 0.96428571]

mean value: 0.9609605911330049

key: train_recall
value: [0.98425197 0.98425197 0.98023715 0.98418972 0.98818898 0.98031496
 0.98031496 0.98031496 0.98031496 0.98425197]

mean value: 0.9826631601879805

key: test_roc_auc
value: [0.98214286 0.9476601  0.96490148 0.94704433 0.92857143 1.
 0.96428571 0.94642857 0.96428571 0.96428571]

mean value: 0.960960591133005

key: train_roc_auc
value: [0.97236314 0.97631571 0.97437055 0.97634683 0.97834646 0.97047244
 0.97440945 0.97637795 0.97244094 0.97440945]

mean value: 0.974585291463073

key: test_jcc
value: [0.96428571 0.9        0.93333333 0.90322581 0.86666667 1.
 0.93103448 0.9        0.92857143 0.93103448]

mean value: 0.9258151914825997

key: train_jcc
value: [0.9469697  0.95419847 0.95019157 0.95402299 0.95801527 0.94318182
 0.95038168 0.95402299 0.94676806 0.95057034]

mean value: 0.9508322885933389

MCC on Blind test: 0.2

Accuracy on Blind test: 0.5

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00826478 0.00817037 0.00865364 0.00832152 0.00791717 0.00800776
 0.00859976 0.00781083 0.00889087 0.00782251]

mean value: 0.00824592113494873

key: score_time
value: [0.00801754 0.00853825 0.01080704 0.00824523 0.00855088 0.00804639
 0.00831342 0.00840735 0.00844622 0.00819445]

mean value: 0.008556675910949708

key: test_mcc
value: [0.8953202  0.82512315 0.85960591 0.71921182 0.71611487 0.75047877
 0.67900461 0.75047877 0.64450339 0.82195294]

mean value: 0.766179444196459

key: train_mcc
value: [0.76340037 0.76340037 0.76353762 0.75544282 0.77564465 0.77588525
 0.77564465 0.77991449 0.79149195 0.76800824]

mean value: 0.7712370421013379

key: test_accuracy
value: [0.94736842 0.9122807  0.92982456 0.85964912 0.85714286 0.875
 0.83928571 0.875      0.82142857 0.91071429]

mean value: 0.8827694235588972

key: train_accuracy
value: [0.8816568  0.8816568  0.8816568  0.87771203 0.88779528 0.88779528
 0.88779528 0.88976378 0.89566929 0.88385827]

mean value: 0.8855359611113699

key: test_fscore
value: [0.94736842 0.9122807  0.93103448 0.86206897 0.86206897 0.87272727
 0.84210526 0.87272727 0.82758621 0.90909091]

mean value: 0.8839058461200022

key: train_fscore
value: [0.8828125  0.8828125  0.8828125  0.87698413 0.88845401 0.88932039
 0.88845401 0.89147287 0.89668616 0.88543689]

mean value: 0.8865245960082

key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.86206897 0.83333333 0.88888889
 0.82758621 0.88888889 0.8        0.92592593]

mean value: 0.8785312899106003

key: train_precision
value: [0.87596899 0.87596899 0.87258687 0.88047809 0.88326848 0.87739464
 0.88326848 0.8778626  0.88803089 0.87356322]

mean value: 0.8788391247569809

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.86206897 0.89285714 0.85714286
 0.85714286 0.85714286 0.85714286 0.89285714]

mean value: 0.8900246305418719

key: train_recall
value: [0.88976378 0.88976378 0.89328063 0.87351779 0.89370079 0.9015748
 0.89370079 0.90551181 0.90551181 0.8976378 ]

mean value: 0.8943963773303041

key: test_roc_auc
value: [0.9476601  0.91256158 0.92980296 0.85960591 0.85714286 0.875
 0.83928571 0.875      0.82142857 0.91071429]

mean value: 0.882820197044335

key: train_roc_auc
value: [0.88164078 0.88164078 0.88167969 0.87770378 0.88779528 0.88779528
 0.88779528 0.88976378 0.89566929 0.88385827]

mean value: 0.8855342192897825

key: test_jcc
value: [0.9        0.83870968 0.87096774 0.75757576 0.75757576 0.77419355
 0.72727273 0.77419355 0.70588235 0.83333333]

mean value: 0.7939704444827784

key: train_jcc
value: [0.79020979 0.79020979 0.79020979 0.78091873 0.79929577 0.8006993
 0.79929577 0.8041958  0.81272085 0.79442509]

mean value: 0.7962180687899996

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.07053018 0.055233   0.05712581 0.05738688 0.05596948 0.2296176
 0.04953313 0.04858136 0.05947566 0.05467701]

mean value: 0.07381300926208496

key: score_time
value: [0.01033711 0.01042008 0.01019645 0.01021481 0.01025701 0.01056623
 0.01241708 0.00986052 0.01005435 0.01003385]

mean value: 0.010435748100280761

key: test_mcc
value: [0.96547546 0.92980296 0.96547546 0.96547546 0.89342711 1.
 0.96490128 0.89342711 0.96490128 0.92857143]

mean value: 0.9471457541234694

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98245614 0.96491228 0.98245614 0.98245614 0.94642857 1.
 0.98214286 0.94642857 0.98214286 0.96428571]

mean value: 0.9733709273182957

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.96428571 0.98305085 0.98305085 0.94736842 1.
 0.98245614 0.94736842 0.98181818 0.96428571]

mean value: 0.9735502469579187

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96428571 0.96666667 0.96666667 0.93103448 1.
 0.96551724 0.93103448 1.         0.96428571]

mean value: 0.9689490968801313

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 1.         1.         0.96428571 1.
 1.         0.96428571 0.96428571 0.96428571]

mean value: 0.9785714285714285

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98214286 0.96490148 0.98214286 0.98214286 0.94642857 1.
 0.98214286 0.94642857 0.98214286 0.96428571]

mean value: 0.9732758620689657

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.93103448 0.96666667 0.96666667 0.9        1.
 0.96551724 0.9        0.96428571 0.93103448]

mean value: 0.9489490968801314

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.36

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01379895 0.04094553 0.04091215 0.04156613 0.04213095 0.04107809
 0.04116917 0.04115582 0.03444266 0.04156756]

mean value: 0.03787670135498047

key: score_time
value: [0.01027441 0.021981   0.01984763 0.02085829 0.02091908 0.02200484
 0.01950645 0.01698136 0.02125549 0.01091409]

mean value: 0.018454265594482423

key: test_mcc
value: [0.85960591 0.8953202  0.85960591 0.82490815 0.75434227 0.78772636
 0.75434227 0.71611487 0.68250015 0.82195294]

mean value: 0.7956419031963872

key: train_mcc
value: [0.86611359 0.85893744 0.84648438 0.84263794 0.86253233 0.85105352
 0.83910959 0.85105352 0.85545187 0.83890131]

mean value: 0.8512275503641218

key: test_accuracy
value: [0.92982456 0.94736842 0.92982456 0.9122807  0.875      0.89285714
 0.875      0.85714286 0.83928571 0.91071429]

mean value: 0.8969298245614035

key: train_accuracy
value: [0.93293886 0.92899408 0.92307692 0.92110454 0.93110236 0.92519685
 0.91929134 0.92519685 0.92716535 0.91929134]

mean value: 0.925335849291028

key: test_fscore
value: [0.92857143 0.94736842 0.93103448 0.91525424 0.88135593 0.88888889
 0.88135593 0.85185185 0.84745763 0.90909091]

mean value: 0.898222971102789

key: train_fscore
value: [0.93385214 0.93076923 0.92397661 0.92217899 0.93203883 0.92664093
 0.92069632 0.92664093 0.92898273 0.92038835]

mean value: 0.9266165055588382

key: test_precision
value: [0.92857143 0.93103448 0.93103448 0.9        0.83870968 0.92307692
 0.83870968 0.88461538 0.80645161 0.92592593]

mean value: 0.8908129595448839

key: train_precision
value: [0.92307692 0.90977444 0.91153846 0.90804598 0.91954023 0.90909091
 0.90494297 0.90909091 0.90636704 0.90804598]

mean value: 0.9109513829773443

key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.85714286
 0.92857143 0.82142857 0.89285714 0.89285714]

mean value: 0.9076354679802956

key: train_recall
value: [0.94488189 0.95275591 0.93675889 0.93675889 0.94488189 0.94488189
 0.93700787 0.94488189 0.95275591 0.93307087]

mean value: 0.9428635896797485

key: test_roc_auc
value: [0.92980296 0.9476601  0.92980296 0.91194581 0.875      0.89285714
 0.875      0.85714286 0.83928571 0.91071429]

mean value: 0.8969211822660099

key: train_roc_auc
value: [0.93291525 0.92894712 0.92310386 0.92113535 0.93110236 0.92519685
 0.91929134 0.92519685 0.92716535 0.91929134]

mean value: 0.9253345678628117

key: test_jcc
value: [0.86666667 0.9        0.87096774 0.84375    0.78787879 0.8
 0.78787879 0.74193548 0.73529412 0.83333333]

mean value: 0.8167704919211086

key: train_jcc
value: [0.87591241 0.8705036  0.85869565 0.85559567 0.87272727 0.86330935
 0.85304659 0.86330935 0.86738351 0.85251799]

mean value: 0.8633001396827011

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02298999 0.00778127 0.00742817 0.00751019 0.00740218 0.00754929
 0.00757813 0.00776744 0.00745249 0.00765824]

mean value: 0.009111738204956055

key: score_time
value: [0.00836825 0.00810742 0.00794792 0.00793934 0.00797534 0.0079298
 0.00799894 0.00800514 0.00802684 0.00787163]

mean value: 0.00801706314086914

key: test_mcc
value: [0.8953202  0.82512315 0.85960591 0.78940887 0.71611487 0.75047877
 0.67900461 0.75047877 0.64450339 0.82195294]

mean value: 0.7731991486299565

key: train_mcc
value: [0.76340037 0.76340037 0.76741581 0.77919572 0.78351922 0.77588525
 0.78749923 0.78361641 0.79139378 0.77186893]

mean value: 0.776719508855672

key: test_accuracy
value: [0.94736842 0.9122807  0.92982456 0.89473684 0.85714286 0.875
 0.83928571 0.875      0.82142857 0.91071429]

mean value: 0.8862781954887218

key: train_accuracy
value: [0.8816568  0.8816568  0.88362919 0.88954635 0.89173228 0.88779528
 0.89370079 0.89173228 0.89566929 0.88582677]

mean value: 0.8882945844787152

key: test_fscore
value: [0.94736842 0.9122807  0.93103448 0.89655172 0.86206897 0.87272727
 0.84210526 0.87272727 0.82758621 0.90909091]

mean value: 0.8873541219820712

key: train_fscore
value: [0.8828125  0.8828125  0.88454012 0.89019608 0.89236791 0.88932039
 0.89453125 0.89278752 0.8962818  0.88715953]

mean value: 0.8892809598096044

key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.89655172 0.83333333 0.88888889
 0.82758621 0.88888889 0.8        0.92592593]

mean value: 0.8819795657726692

key: train_precision
value: [0.87596899 0.87596899 0.87596899 0.88326848 0.88715953 0.87739464
 0.8875969  0.88416988 0.89105058 0.87692308]

mean value: 0.8815470072299069

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.89655172 0.89285714 0.85714286
 0.85714286 0.85714286 0.85714286 0.89285714]

mean value: 0.8934729064039408

key: train_recall
value: [0.88976378 0.88976378 0.89328063 0.8972332  0.8976378  0.9015748
 0.9015748  0.9015748  0.9015748  0.8976378 ]

mean value: 0.8971616196196819

key: test_roc_auc
value: [0.9476601  0.91256158 0.92980296 0.89470443 0.85714286 0.875
 0.83928571 0.875      0.82142857 0.91071429]

mean value: 0.8863300492610838

key: train_roc_auc
value: [0.88164078 0.88164078 0.88364819 0.88956148 0.89173228 0.88779528
 0.89370079 0.89173228 0.89566929 0.88582677]

mean value: 0.8882947931903769

key: test_jcc
value: [0.9        0.83870968 0.87096774 0.8125     0.75757576 0.77419355
 0.72727273 0.77419355 0.70588235 0.83333333]

mean value: 0.7994628687252027

key: train_jcc
value: [0.79020979 0.79020979 0.79298246 0.80212014 0.80565371 0.8006993
 0.80918728 0.80633803 0.81205674 0.7972028 ]

mean value: 0.8006660030961745

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00975227 0.01262665 0.01330996 0.01325607 0.01203847 0.01280475
 0.01180601 0.01305366 0.01356101 0.01355767]

mean value: 0.012576651573181153

key: score_time
value: [0.00793934 0.01009583 0.01011109 0.01049399 0.01059914 0.01051426
 0.01052403 0.0104363  0.01050282 0.01055479]

mean value: 0.010177159309387207

key: test_mcc
value: [0.93202124 0.8953202  0.89952865 0.86189955 0.75047877 0.93094934
 0.79385662 0.78571429 0.56573571 0.78571429]

mean value: 0.8201218647363345

key: train_mcc
value: [0.90138807 0.90933566 0.85396037 0.85053095 0.9021413  0.91064232
 0.84093872 0.88232751 0.83427977 0.86279984]

mean value: 0.8748344497092373

key: test_accuracy
value: [0.96491228 0.94736842 0.94736842 0.92982456 0.875      0.96428571
 0.89285714 0.89285714 0.76785714 0.89285714]

mean value: 0.9075187969924812

key: train_accuracy
value: [0.95069034 0.95463511 0.92504931 0.92504931 0.9507874  0.95472441
 0.91929134 0.94094488 0.91338583 0.92913386]

mean value: 0.9363691779651804

key: test_fscore
value: [0.96296296 0.94736842 0.95081967 0.92857143 0.87272727 0.96296296
 0.9        0.89285714 0.8        0.89285714]

mean value: 0.9111127006122692

key: train_fscore
value: [0.95069034 0.95445545 0.92830189 0.92607004 0.9498998  0.95353535
 0.92220114 0.94186047 0.91881919 0.93258427]

mean value: 0.9378417921178791

key: test_precision
value: [1.         0.93103448 0.90625    0.96296296 0.88888889 1.
 0.84375    0.89285714 0.7027027  0.89285714]

mean value: 0.9021303323027461

key: train_precision
value: [0.95256917 0.96015936 0.88808664 0.91187739 0.96734694 0.97925311
 0.89010989 0.92748092 0.86458333 0.88928571]

mean value: 0.9230752474313746

key: test_recall
value: [0.92857143 0.96428571 1.         0.89655172 0.85714286 0.92857143
 0.96428571 0.89285714 0.92857143 0.89285714]

mean value: 0.9253694581280788

key: train_recall
value: [0.9488189  0.9488189  0.97233202 0.94071146 0.93307087 0.92913386
 0.95669291 0.95669291 0.98031496 0.98031496]

mean value: 0.9546901745977405

key: test_roc_auc
value: [0.96428571 0.9476601  0.94642857 0.93041872 0.875      0.96428571
 0.89285714 0.89285714 0.76785714 0.89285714]

mean value: 0.9074507389162563

key: train_roc_auc
value: [0.95069403 0.9546466  0.92514239 0.92508014 0.9507874  0.95472441
 0.91929134 0.94094488 0.91338583 0.92913386]

mean value: 0.9363830879835673

key: test_jcc
value: [0.92857143 0.9        0.90625    0.86666667 0.77419355 0.92857143
 0.81818182 0.80645161 0.66666667 0.80645161]

mean value: 0.8402004782851558

key: train_jcc
value: [0.90601504 0.91287879 0.86619718 0.86231884 0.90458015 0.91119691
 0.8556338  0.89010989 0.84982935 0.87368421]

mean value: 0.8832444168008685

MCC on Blind test: 0.16

Accuracy on Blind test: 0.45

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01402736 0.01417279 0.01328135 0.01392245 0.01277757 0.01429009
 0.01231146 0.01305246 0.01252604 0.01286674]

mean value: 0.013322830200195312

key: score_time
value: [0.0107367  0.0107584  0.01064038 0.01052928 0.01053858 0.01060414
 0.01047802 0.01054215 0.01091051 0.01058197]

mean value: 0.010632014274597168

key: test_mcc
value: [0.93202124 0.83703659 0.82942474 0.82490815 0.64951905 0.93094934
 0.82195294 0.85714286 0.59628479 0.92857143]

mean value: 0.8207811130466791

key: train_mcc
value: [0.89234379 0.81176962 0.87340231 0.8905544  0.89426234 0.88323242
 0.90174953 0.91732994 0.87948771 0.89075842]

mean value: 0.8834890481349534

key: test_accuracy
value: [0.96491228 0.9122807  0.9122807  0.9122807  0.82142857 0.96428571
 0.91071429 0.92857143 0.78571429 0.96428571]

mean value: 0.9076754385964912

key: train_accuracy
value: [0.94477318 0.89940828 0.93491124 0.94477318 0.94685039 0.94094488
 0.9507874  0.95866142 0.93897638 0.94488189]

mean value: 0.9404968239916756

key: test_fscore
value: [0.96296296 0.90196078 0.90909091 0.91525424 0.83333333 0.96551724
 0.9122807  0.92857143 0.8125     0.96428571]

mean value: 0.9105757312979906

key: train_fscore
value: [0.94262295 0.88984881 0.93167702 0.94594595 0.94777563 0.94252874
 0.95126706 0.95874263 0.94072658 0.94615385]

mean value: 0.9397289204487953

key: test_precision
value: [1.         1.         0.96153846 0.9        0.78125    0.93333333
 0.89655172 0.92857143 0.72222222 0.96428571]

mean value: 0.9087752884089091

key: train_precision
value: [0.98290598 0.98564593 0.97826087 0.9245283  0.93155894 0.91791045
 0.94208494 0.95686275 0.91449814 0.92481203]

mean value: 0.9459068329016868

key: test_recall
value: [0.92857143 0.82142857 0.86206897 0.93103448 0.89285714 1.
 0.92857143 0.92857143 0.92857143 0.96428571]

mean value: 0.9185960591133004

key: train_recall
value: [0.90551181 0.81102362 0.88932806 0.96837945 0.96456693 0.96850394
 0.96062992 0.96062992 0.96850394 0.96850394]

mean value: 0.9365581525629454

key: test_roc_auc
value: [0.96428571 0.91071429 0.91317734 0.91194581 0.82142857 0.96428571
 0.91071429 0.92857143 0.78571429 0.96428571]

mean value: 0.907512315270936

key: train_roc_auc
value: [0.94485077 0.89958296 0.93482151 0.94481964 0.94685039 0.94094488
 0.9507874  0.95866142 0.93897638 0.94488189]

mean value: 0.940517724316081

key: test_jcc
value: [0.92857143 0.82142857 0.83333333 0.84375    0.71428571 0.93333333
 0.83870968 0.86666667 0.68421053 0.93103448]

mean value: 0.8395323734112813

key: train_jcc
value: [0.89147287 0.80155642 0.87209302 0.8974359  0.90073529 0.89130435
 0.9070632  0.92075472 0.88808664 0.89781022]

mean value: 0.8868312626670497

MCC on Blind test: 0.23

Accuracy on Blind test: 0.67

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10967422 0.09465766 0.0946455  0.09465408 0.09431386 0.09605312
 0.09578514 0.09517407 0.09609222 0.09636235]

mean value: 0.09674122333526611

key: score_time
value: [0.01450157 0.01411438 0.01444912 0.0144279  0.0141592  0.01532269
 0.01432991 0.01458526 0.01432729 0.01431394]

mean value: 0.014453125

key: test_mcc
value: [0.93202124 0.92980296 0.8953202  0.93202124 0.82618439 0.96490128
 0.96490128 0.89342711 0.96490128 0.89342711]

mean value: 0.919690809367707

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.96491228 0.94736842 0.96491228 0.91071429 0.98214286
 0.98214286 0.94642857 0.98214286 0.94642857]

mean value: 0.9592105263157894

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96296296 0.96428571 0.94736842 0.96666667 0.91525424 0.98181818
 0.98245614 0.94736842 0.98181818 0.94545455]

mean value: 0.9595453472750529

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96428571 0.96428571 0.93548387 0.87096774 1.
 0.96551724 0.93103448 1.         0.96296296]

mean value: 0.9594537728575548

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.92857143 0.96428571 0.93103448 1.         0.96428571 0.96428571
 1.         0.96428571 0.96428571 0.92857143]

mean value: 0.9609605911330049

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96428571 0.96490148 0.9476601  0.96428571 0.91071429 0.98214286
 0.98214286 0.94642857 0.98214286 0.94642857]

mean value: 0.9591133004926109

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.92857143 0.93103448 0.9        0.93548387 0.84375    0.96428571
 0.96551724 0.9        0.96428571 0.89655172]

mean value: 0.9229480176386461

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.14

Accuracy on Blind test: 0.43

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03699589 0.04877973 0.05854607 0.05341744 0.04653335 0.03373337
 0.049371   0.03414083 0.04613113 0.05712962]

mean value: 0.04647784233093262

key: score_time
value: [0.02744269 0.02593279 0.03697324 0.0348525  0.0197053  0.0282557
 0.0171566  0.02019095 0.02783036 0.03664637]

mean value: 0.027498650550842284

key: test_mcc
value: [0.96547546 0.92980296 0.8953202  0.93202124 0.82618439 1.
 0.96490128 0.89342711 0.93094934 0.92857143]

mean value: 0.9266653398520664

key: train_mcc
value: [0.99214142 0.99211042 0.99214118 1.         0.99212598 0.98428248
 0.98825791 1.         0.99212598 0.98819663]

mean value: 0.9921382021238081

key: test_accuracy
value: [0.98245614 0.96491228 0.94736842 0.96491228 0.91071429 1.
 0.98214286 0.94642857 0.96428571 0.96428571]

mean value: 0.9627506265664161

key: train_accuracy
value: [0.99605523 0.99605523 0.99605523 1.         0.99606299 0.99212598
 0.99409449 1.         0.99606299 0.99409449]

mean value: 0.9960606625355263

key: test_fscore
value: [0.98181818 0.96428571 0.94736842 0.96666667 0.91525424 1.
 0.98245614 0.94736842 0.96296296 0.96428571]

mean value: 0.9632466459763516

key: train_fscore
value: [0.99604743 0.99606299 0.99603175 1.         0.99606299 0.99209486
 0.99405941 1.         0.99606299 0.99408284]

mean value: 0.9960505261077098

key: test_precision
value: [1.         0.96428571 0.96428571 0.93548387 0.87096774 1.
 0.96551724 0.93103448 1.         0.96428571]

mean value: 0.9595860479898299

key: train_precision
value: [1.         0.99606299 1.         1.         0.99606299 0.99603175
 1.         1.         0.99606299 0.99604743]

mean value: 0.9980268153239739

key: test_recall
value: [0.96428571 0.96428571 0.93103448 1.         0.96428571 1.
 1.         0.96428571 0.92857143 0.96428571]

mean value: 0.968103448275862

key: train_recall
value: [0.99212598 0.99606299 0.99209486 1.         0.99606299 0.98818898
 0.98818898 1.         0.99606299 0.99212598]

mean value: 0.9940913759297875

key: test_roc_auc
value: [0.98214286 0.96490148 0.9476601  0.96428571 0.91071429 1.
 0.98214286 0.94642857 0.96428571 0.96428571]

mean value: 0.9626847290640395

key: train_roc_auc
value: [0.99606299 0.99605521 0.99604743 1.         0.99606299 0.99212598
 0.99409449 1.         0.99606299 0.99409449]

mean value: 0.9960606579315926

key: test_jcc
value: [0.96428571 0.93103448 0.9        0.93548387 0.84375    1.
 0.96551724 0.9        0.92857143 0.93103448]

mean value: 0.9299677220721436

key: train_jcc
value: [0.99212598 0.99215686 0.99209486 1.         0.99215686 0.98431373
 0.98818898 1.         0.99215686 0.98823529]

mean value: 0.9921429430133137

MCC on Blind test: 0.15

Accuracy on Blind test: 0.38

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.15483451 0.16543531 0.17816734 0.18977332 0.14207339 0.17039871
 0.14135075 0.18826556 0.17411089 0.15683222]

mean value: 0.16612420082092286

key: score_time
value: [0.01990747 0.02101636 0.02146673 0.02004623 0.02186728 0.02010036
 0.01258993 0.02006197 0.02341485 0.02047729]

mean value: 0.020094847679138182

key: test_mcc
value: [0.8953202  0.86189955 0.82512315 0.79110556 0.75047877 0.75047877
 0.68250015 0.75047877 0.64951905 0.85714286]

mean value: 0.7814046839086336

key: train_mcc
value: [0.85051239 0.85019923 0.84231823 0.8428767  0.85465533 0.84293789
 0.83890131 0.85513299 0.87062545 0.84677832]

mean value: 0.8494937826202889

key: test_accuracy
value: [0.94736842 0.92982456 0.9122807  0.89473684 0.875      0.875
 0.83928571 0.875      0.82142857 0.92857143]

mean value: 0.8898496240601503

key: train_accuracy
value: [0.92504931 0.92504931 0.92110454 0.92110454 0.92716535 0.92125984
 0.91929134 0.92716535 0.93503937 0.92322835]

mean value: 0.9245457298606905

key: test_fscore
value: [0.94736842 0.93103448 0.9122807  0.9        0.87719298 0.87272727
 0.84745763 0.87272727 0.83333333 0.92857143]

mean value: 0.892269352249973

key: train_fscore
value: [0.92635659 0.92578125 0.92156863 0.92248062 0.92815534 0.92248062
 0.92038835 0.92870906 0.93617021 0.92427184]

mean value: 0.925636250953157

key: test_precision
value: [0.93103448 0.9        0.92857143 0.87096774 0.86206897 0.88888889
 0.80645161 0.88888889 0.78125    0.92857143]

mean value: 0.8786693438035207

key: train_precision
value: [0.91221374 0.91860465 0.91439689 0.90494297 0.91570881 0.90839695
 0.90804598 0.90943396 0.92015209 0.91187739]

mean value: 0.9123773428551641

key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.85714286
 0.89285714 0.85714286 0.89285714 0.92857143]

mean value: 0.9077586206896552

key: train_recall
value: [0.94094488 0.93307087 0.92885375 0.94071146 0.94094488 0.93700787
 0.93307087 0.9488189  0.95275591 0.93700787]

mean value: 0.9393187264635399

key: test_roc_auc
value: [0.9476601  0.93041872 0.91256158 0.89408867 0.875      0.875
 0.83928571 0.875      0.82142857 0.92857143]

mean value: 0.8899014778325124

key: train_roc_auc
value: [0.9250179  0.92503346 0.92111979 0.92114313 0.92716535 0.92125984
 0.91929134 0.92716535 0.93503937 0.92322835]

mean value: 0.9245463882232112

key: test_jcc
value: [0.9        0.87096774 0.83870968 0.81818182 0.78125    0.77419355
 0.73529412 0.77419355 0.71428571 0.86666667]

mean value: 0.807374283291029

key: train_jcc
value: [0.86281588 0.86181818 0.85454545 0.85611511 0.86594203 0.85611511
 0.85251799 0.86690647 0.88       0.85920578]

mean value: 0.8615982002257956

MCC on Blind test: 0.29

Accuracy on Blind test: 0.72

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.25657248 0.24637365 0.24630475 0.24570489 0.24687362 0.24694991
 0.24950528 0.24740005 0.24689674 0.24667573]

mean value: 0.2479257106781006

key: score_time
value: [0.00848842 0.00830841 0.00831699 0.00836349 0.00849056 0.00834179
 0.00837541 0.00853562 0.00849915 0.00830841]

mean value: 0.008402824401855469

key: test_mcc
value: [0.96547546 0.92980296 0.92980296 0.93202124 0.82195294 1.
 0.96490128 0.89342711 0.96490128 0.92857143]

mean value: 0.9330856656296584

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98245614 0.96491228 0.96491228 0.96491228 0.91071429 1.
 0.98214286 0.94642857 0.98214286 0.96428571]

mean value: 0.9662907268170426

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.96428571 0.96551724 0.96666667 0.9122807  1.
 0.98245614 0.94736842 0.98181818 0.96428571]

mean value: 0.9666496963411664

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96428571 0.96551724 0.93548387 0.89655172 1.
 0.96551724 0.93103448 1.         0.96428571]

mean value: 0.9622675989194343

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.96551724 1.         0.92857143 1.
 1.         0.96428571 0.96428571 0.96428571]

mean value: 0.971551724137931

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98214286 0.96490148 0.96490148 0.96428571 0.91071429 1.
 0.98214286 0.94642857 0.98214286 0.96428571]

mean value: 0.9661945812807883

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.93103448 0.93333333 0.93548387 0.83870968 1.
 0.96551724 0.9        0.96428571 0.93103448]

mean value: 0.9363684517188411

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.3

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01195836 0.01368475 0.01422262 0.01405859 0.01395178 0.01564765
 0.01407719 0.01829219 0.02588701 0.01483369]

mean value: 0.0156613826751709

key: score_time
value: [0.01075363 0.0107801  0.01071596 0.01084328 0.01077461 0.01080632
 0.01084328 0.01160669 0.01160884 0.01082087]

mean value: 0.010955357551574707

key: test_mcc
value: [0.58069726 0.65466436 0.5920535  0.56277738 0.30588765 0.43876345
 0.77459667 0.64116714 0.57735027 0.55339859]

mean value: 0.5681356266624504

key: train_mcc
value: [0.6451496  0.68602482 0.64393328 0.68142563 0.57742076 0.7295157
 0.62763342 0.69688549 0.64324077 0.65891447]

mean value: 0.6590143937690011

key: test_accuracy
value: [0.75438596 0.8245614  0.77192982 0.77192982 0.64285714 0.71428571
 0.875      0.80357143 0.75       0.75      ]

mean value: 0.7658521303258146

key: train_accuracy
value: [0.79684418 0.82840237 0.79487179 0.82248521 0.7519685  0.86220472
 0.78740157 0.83070866 0.79724409 0.80708661]

mean value: 0.8079217723524205

key: test_fscore
value: [0.66666667 0.80769231 0.72340426 0.74509804 0.56521739 0.68
 0.85714286 0.76595745 0.66666667 0.68181818]

mean value: 0.7159663812634374

key: train_fscore
value: [0.74816626 0.8        0.74257426 0.78773585 0.671875   0.85355649
 0.73399015 0.79906542 0.74939173 0.76442308]

mean value: 0.7650778223767692

key: test_precision
value: [1.         0.875      0.94444444 0.86363636 0.72222222 0.77272727
 1.         0.94736842 1.         0.9375    ]

mean value: 0.9062898724082935

key: train_precision
value: [0.98709677 0.96132597 0.99337748 0.97660819 0.99230769 0.91071429
 0.98026316 0.98275862 0.98089172 0.98148148]

mean value: 0.9746825369455663

key: test_recall
value: [0.5        0.75       0.5862069  0.65517241 0.46428571 0.60714286
 0.75       0.64285714 0.5        0.53571429]

mean value: 0.5991379310344828

key: train_recall
value: [0.6023622  0.68503937 0.59288538 0.66007905 0.50787402 0.80314961
 0.58661417 0.67322835 0.60629921 0.62598425]

mean value: 0.6343515607979833

key: test_roc_auc
value: [0.75       0.82327586 0.77524631 0.77401478 0.64285714 0.71428571
 0.875      0.80357143 0.75       0.75      ]

mean value: 0.7658251231527093

key: train_roc_auc
value: [0.79722853 0.82868569 0.79447418 0.82216551 0.7519685  0.86220472
 0.78740157 0.83070866 0.79724409 0.80708661]

mean value: 0.8079168093118795

key: test_jcc
value: [0.5        0.67741935 0.56666667 0.59375    0.39393939 0.51515152
 0.75       0.62068966 0.5        0.51724138]

mean value: 0.5634857965079044

key: train_jcc
value: [0.59765625 0.66666667 0.59055118 0.64980545 0.50588235 0.74452555
 0.57976654 0.66536965 0.59922179 0.61867704]

mean value: 0.621812246508153

MCC on Blind test: 0.37

Accuracy on Blind test: 0.82

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02161765 0.04498935 0.02991724 0.02177143 0.01122379 0.01113892
 0.01118159 0.02967238 0.01116443 0.01118255]

mean value: 0.020385932922363282

key: score_time
value: [0.01993227 0.02460504 0.02005219 0.01057243 0.0105114  0.01053619
 0.01052952 0.01053238 0.010499   0.0106318 ]

mean value: 0.013840222358703613

key: test_mcc
value: [0.8953202  0.8953202  0.85960591 0.79110556 0.71611487 0.82195294
 0.71611487 0.71611487 0.68250015 0.82195294]

mean value: 0.7916102525004516

key: train_mcc
value: [0.81126698 0.82324487 0.81877755 0.81895888 0.82769588 0.81142619
 0.81142619 0.82718204 0.83529327 0.8154727 ]

mean value: 0.8200744570697928

key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.89473684 0.85714286 0.91071429
 0.85714286 0.85714286 0.83928571 0.91071429]

mean value: 0.8951441102756892

key: train_accuracy
value: [0.90532544 0.9112426  0.90927022 0.90927022 0.91338583 0.90551181
 0.90551181 0.91338583 0.91732283 0.90748031]

mean value: 0.9097706906459178

key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.9        0.86206897 0.90909091
 0.86206897 0.85185185 0.84745763 0.90909091]

mean value: 0.8967400553050681

key: train_fscore
value: [0.90733591 0.9132948  0.91015625 0.91050584 0.91538462 0.90697674
 0.90697674 0.91472868 0.91891892 0.90909091]

mean value: 0.9113369405536723

key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.87096774 0.83333333 0.92592593
 0.83333333 0.88461538 0.80645161 0.92592593]

mean value: 0.8873656706248475

key: train_precision
value: [0.89015152 0.89433962 0.8996139  0.89655172 0.89473684 0.89312977
 0.89312977 0.90076336 0.90151515 0.89353612]

mean value: 0.8957467777601632

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
 0.89285714 0.82142857 0.89285714 0.89285714]

mean value: 0.9076354679802956

key: train_recall
value: [0.92519685 0.93307087 0.92094862 0.92490119 0.93700787 0.92125984
 0.92125984 0.92913386 0.93700787 0.92519685]

mean value: 0.9274983660639258

key: test_roc_auc
value: [0.9476601  0.9476601  0.92980296 0.89408867 0.85714286 0.91071429
 0.85714286 0.85714286 0.83928571 0.91071429]

mean value: 0.8951354679802956

key: train_roc_auc
value: [0.90528617 0.91119946 0.90929321 0.90930099 0.91338583 0.90551181
 0.90551181 0.91338583 0.91732283 0.90748031]

mean value: 0.9097678254645047

key: test_jcc
value: [0.9        0.9        0.87096774 0.81818182 0.75757576 0.83333333
 0.75757576 0.74193548 0.73529412 0.83333333]

mean value: 0.814819734345351

key: train_jcc
value: [0.83038869 0.84042553 0.83512545 0.83571429 0.84397163 0.82978723
 0.82978723 0.84285714 0.85       0.83333333]

mean value: 0.8371390533718615

MCC on Blind test: 0.25

Accuracy on Blind test: 0.71

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.18555856 0.20449281 0.19151855 0.191679   0.19248724 0.19242501
 0.20449567 0.27775383 0.19228506 0.1919651 ]

mean value: 0.20246608257293702

key: score_time
value: [0.02051473 0.01998162 0.02048826 0.02080917 0.02009439 0.01971388
 0.0109446  0.02007937 0.01075292 0.01076293]

mean value: 0.017414188385009764

key: test_mcc
value: [0.85960591 0.8953202  0.85960591 0.82490815 0.75434227 0.82195294
 0.71611487 0.71611487 0.68250015 0.82195294]

mean value: 0.7952418219423117

key: train_mcc
value: [0.86225372 0.8551535  0.84648438 0.83474492 0.86253233 0.8431734
 0.81142619 0.85105352 0.83529327 0.8154727 ]

mean value: 0.8417587938288557

key: test_accuracy
value: [0.92982456 0.94736842 0.92982456 0.9122807  0.875      0.91071429
 0.85714286 0.85714286 0.83928571 0.91071429]

mean value: 0.8969298245614035

key: train_accuracy
value: [0.93096647 0.9270217  0.92307692 0.91715976 0.93110236 0.92125984
 0.90551181 0.92519685 0.91732283 0.90748031]

mean value: 0.9206098867819037

key: test_fscore
value: [0.92857143 0.94736842 0.93103448 0.91525424 0.88135593 0.90909091
 0.86206897 0.85185185 0.84745763 0.90909091]

mean value: 0.8983144764543761

key: train_fscore
value: [0.93203883 0.92898273 0.92397661 0.91828794 0.93203883 0.92277992
 0.90697674 0.92664093 0.91891892 0.90909091]

mean value: 0.9219732362977793

key: test_precision
value: [0.92857143 0.93103448 0.93103448 0.9        0.83870968 0.92592593
 0.83333333 0.88461538 0.80645161 0.92592593]

mean value: 0.8905602254211821

key: train_precision
value: [0.91954023 0.90636704 0.91153846 0.90421456 0.91954023 0.90530303
 0.89312977 0.90909091 0.90151515 0.89353612]

mean value: 0.9063775505468512

key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
 0.89285714 0.82142857 0.89285714 0.89285714]

mean value: 0.9076354679802956

key: train_recall
value: [0.94488189 0.95275591 0.93675889 0.93280632 0.94488189 0.94094488
 0.92125984 0.94488189 0.93700787 0.92519685]

mean value: 0.9381376241013352

key: test_roc_auc
value: [0.92980296 0.9476601  0.92980296 0.91194581 0.875      0.91071429
 0.85714286 0.85714286 0.83928571 0.91071429]

mean value: 0.8969211822660099

key: train_roc_auc
value: [0.93093897 0.92697084 0.92310386 0.91719056 0.93110236 0.92125984
 0.90551181 0.92519685 0.91732283 0.90748031]

mean value: 0.920607824219601

key: test_jcc
value: [0.86666667 0.9        0.87096774 0.84375    0.78787879 0.83333333
 0.75757576 0.74193548 0.73529412 0.83333333]

mean value: 0.817073522224139

key: train_jcc
value: [0.87272727 0.86738351 0.85869565 0.84892086 0.87272727 0.85663082
 0.82978723 0.86330935 0.85       0.83333333]

mean value: 0.8553515317749246

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03999615 0.02583027 0.02410555 0.02264023 0.02619028 0.02323699
 0.02574015 0.02130461 0.0231998  0.02364349]

mean value: 0.0255887508392334

key: score_time
value: [0.01082921 0.01091146 0.01050711 0.01049757 0.01048684 0.01047969
 0.01068377 0.01047754 0.01049089 0.01047397]

mean value: 0.010583806037902831

key: test_mcc
value: [0.8953202  0.8953202  0.82512315 0.82490815 0.71611487 0.89342711
 0.71611487 0.75047877 0.68250015 0.85933785]

mean value: 0.8058645326851578

key: train_mcc
value: [0.83454496 0.83472439 0.83070006 0.83456039 0.85486752 0.81527029
 0.83505996 0.83076661 0.8355787  0.81511857]

mean value: 0.8321191457866195

key: test_accuracy
value: [0.94736842 0.94736842 0.9122807  0.9122807  0.85714286 0.94642857
 0.85714286 0.875      0.83928571 0.92857143]

mean value: 0.9022869674185463

key: train_accuracy
value: [0.91715976 0.91715976 0.91518738 0.91715976 0.92716535 0.90748031
 0.91732283 0.91535433 0.91732283 0.90748031]

mean value: 0.9158792650918636

key: test_fscore
value: [0.94736842 0.94736842 0.9122807  0.91525424 0.86206897 0.94545455
 0.86206897 0.87272727 0.84745763 0.92592593]

mean value: 0.9037975083408656

key: train_fscore
value: [0.91828794 0.91860465 0.91617934 0.91796875 0.92843327 0.90873786
 0.91860465 0.91585127 0.91923077 0.90838207]

mean value: 0.9170280567760439

key: test_precision
value: [0.93103448 0.93103448 0.92857143 0.9        0.83333333 0.96296296
 0.83333333 0.88888889 0.80645161 0.96153846]

mean value: 0.8977148987048875

key: train_precision
value: [0.90769231 0.90458015 0.90384615 0.90733591 0.91254753 0.89655172
 0.90458015 0.91050584 0.89849624 0.8996139 ]

mean value: 0.9045749903664201

key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.92857143
 0.89285714 0.85714286 0.89285714 0.89285714]

mean value: 0.9113300492610837

key: train_recall
value: [0.92913386 0.93307087 0.92885375 0.92885375 0.94488189 0.92125984
 0.93307087 0.92125984 0.94094488 0.91732283]

mean value: 0.9298652391771187

key: test_roc_auc
value: [0.9476601  0.9476601  0.91256158 0.91194581 0.85714286 0.94642857
 0.85714286 0.875      0.83928571 0.92857143]

mean value: 0.9023399014778325

key: train_roc_auc
value: [0.9171361  0.91712832 0.91521428 0.91718278 0.92716535 0.90748031
 0.91732283 0.91535433 0.91732283 0.90748031]

mean value: 0.9158787463819987

key: test_jcc
value: [0.9        0.9        0.83870968 0.84375    0.75757576 0.89655172
 0.75757576 0.77419355 0.73529412 0.86206897]

mean value: 0.8265719548260198

key: train_jcc
value: [0.84892086 0.84946237 0.84532374 0.84837545 0.86642599 0.83274021
 0.84946237 0.84476534 0.85053381 0.83214286]

mean value: 0.8468153000998123

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.74657226 0.72350025 0.66616726 0.69195914 0.85434008 0.67523313
 0.67365026 0.74283338 0.70560384 0.68593574]

mean value: 0.716579532623291

key: score_time
value: [0.01196027 0.01932144 0.020437   0.01222825 0.01223254 0.01219296
 0.01108098 0.01215911 0.01234174 0.01253176]

mean value: 0.013648605346679688

key: test_mcc
value: [0.93202124 0.92980296 0.92980296 0.85960591 0.78772636 1.
 0.85933785 0.85714286 0.78772636 0.85714286]

mean value: 0.8800309350106305

key: train_mcc
value: [0.93691352 0.93691352 0.94480151 0.93691156 0.93703692 0.93703692
 0.92913386 0.9332517  0.92520402 0.9330781 ]

mean value: 0.9350281642225636

key: test_accuracy
value: [0.96491228 0.96491228 0.96491228 0.92982456 0.89285714 1.
 0.92857143 0.92857143 0.89285714 0.92857143]

mean value: 0.9395989974937343

key: train_accuracy
value: [0.96844181 0.96844181 0.97238659 0.96844181 0.96850394 0.96850394
 0.96456693 0.96653543 0.96259843 0.96653543]

mean value: 0.9674956126046373

key: test_fscore
value: [0.96296296 0.96428571 0.96551724 0.93103448 0.89655172 1.
 0.93103448 0.92857143 0.89655172 0.92857143]

mean value: 0.9405081189563949

key: train_fscore
value: [0.96837945 0.96837945 0.97222222 0.96825397 0.96837945 0.96837945
 0.96456693 0.96620278 0.96267191 0.96646943]

mean value: 0.9673905023176848

key: test_precision
value: [1.         0.96428571 0.96551724 0.93103448 0.86666667 1.
 0.9        0.92857143 0.86666667 0.92857143]

mean value: 0.9351313628899836

key: train_precision
value: [0.97222222 0.97222222 0.97609562 0.97211155 0.97222222 0.97222222
 0.96456693 0.97590361 0.96078431 0.96837945]

mean value: 0.9706730364161126

key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.92857143 1.
 0.96428571 0.92857143 0.92857143 0.92857143]

mean value: 0.9467980295566503

key: train_recall
value: [0.96456693 0.96456693 0.96837945 0.96442688 0.96456693 0.96456693
 0.96456693 0.95669291 0.96456693 0.96456693]

mean value: 0.9641467741433506

key: test_roc_auc
value: [0.96428571 0.96490148 0.96490148 0.92980296 0.89285714 1.
 0.92857143 0.92857143 0.89285714 0.92857143]

mean value: 0.9395320197044336

key: train_roc_auc
value: [0.96844947 0.96844947 0.9723787  0.96843391 0.96850394 0.96850394
 0.96456693 0.96653543 0.96259843 0.96653543]

mean value: 0.9674955650306558

key: test_jcc
value: [0.92857143 0.93103448 0.93333333 0.87096774 0.8125     1.
 0.87096774 0.86666667 0.8125     0.86666667]

mean value: 0.8893208061867683

key: train_jcc
value: [0.93869732 0.93869732 0.94594595 0.93846154 0.93869732 0.93869732
 0.93155894 0.93461538 0.9280303  0.9351145 ]

mean value: 0.9368515883261834

MCC on Blind test: 0.23

Accuracy on Blind test: 0.65

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01068711 0.00849342 0.00775218 0.0076437  0.00741124 0.00742817
 0.00740385 0.0076189  0.00729799 0.0073278 ]

mean value: 0.007906436920166016

key: score_time
value: [0.0128932  0.00827646 0.00841212 0.00812268 0.00794554 0.00796461
 0.00785041 0.00781918 0.00777936 0.00783634]

mean value: 0.008489990234375

key: test_mcc
value: [0.77728159 0.68736396 0.77903565 0.56277738 0.47187011 0.58501794
 0.72168784 0.65814518 0.70082556 0.65814518]

mean value: 0.6602150384851577

key: train_mcc
value: [0.66258992 0.65336491 0.67038524 0.68202471 0.62396093 0.66768511
 0.66768511 0.72158618 0.67809175 0.67572951]

mean value: 0.6703103372008967

key: test_accuracy
value: [0.87719298 0.84210526 0.87719298 0.77192982 0.73214286 0.78571429
 0.85714286 0.82142857 0.83928571 0.82142857]

mean value: 0.8225563909774436

key: train_accuracy
value: [0.82445759 0.81854043 0.82840237 0.83234714 0.79527559 0.82677165
 0.82677165 0.85826772 0.83267717 0.83070866]

mean value: 0.8274219975461647

key: test_fscore
value: [0.85714286 0.83018868 0.8627451  0.74509804 0.70588235 0.76
 0.84615385 0.8        0.81632653 0.8       ]

mean value: 0.8023537403350309

key: train_fscore
value: [0.80525164 0.79646018 0.80879121 0.81069042 0.75586854 0.80701754
 0.80701754 0.84937238 0.81481481 0.81140351]

mean value: 0.8066687790927018

key: test_precision
value: [1.         0.88       1.         0.86363636 0.7826087  0.86363636
 0.91666667 0.90909091 0.95238095 0.90909091]

mean value: 0.9077110860154338

key: train_precision
value: [0.90640394 0.90909091 0.91089109 0.92857143 0.93604651 0.91089109
 0.91089109 0.90625    0.91219512 0.91584158]

mean value: 0.9147072763613312

key: test_recall
value: [0.75       0.78571429 0.75862069 0.65517241 0.64285714 0.67857143
 0.78571429 0.71428571 0.71428571 0.71428571]

mean value: 0.7199507389162562

key: train_recall
value: [0.72440945 0.70866142 0.72727273 0.71936759 0.63385827 0.72440945
 0.72440945 0.7992126  0.73622047 0.72834646]

mean value: 0.7226167875260652

key: test_roc_auc
value: [0.875      0.841133   0.87931034 0.77401478 0.73214286 0.78571429
 0.85714286 0.82142857 0.83928571 0.82142857]

mean value: 0.8226600985221675

key: train_roc_auc
value: [0.82465532 0.81875759 0.82820329 0.83212474 0.79527559 0.82677165
 0.82677165 0.85826772 0.83267717 0.83070866]

mean value: 0.8274213376489994

key: test_jcc
value: [0.75       0.70967742 0.75862069 0.59375    0.54545455 0.61290323
 0.73333333 0.66666667 0.68965517 0.66666667]

mean value: 0.6726727719351467

key: train_jcc
value: [0.67399267 0.66176471 0.67896679 0.68164794 0.60754717 0.67647059
 0.67647059 0.73818182 0.6875     0.68265683]

mean value: 0.6765199100649822

MCC on Blind test: 0.34

Accuracy on Blind test: 0.78

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00778794 0.00758052 0.00752497 0.00759459 0.00758076 0.00756574
 0.00753331 0.00752449 0.00757456 0.0075686 ]

mean value: 0.00758354663848877

key: score_time
value: [0.00791955 0.00788665 0.00793982 0.00793576 0.00796843 0.00790358
 0.00784445 0.00793791 0.00797367 0.00794005]

mean value: 0.007924985885620118

key: test_mcc
value: [0.8953202  0.82512315 0.85960591 0.71921182 0.71611487 0.75047877
 0.64285714 0.75047877 0.64450339 0.82195294]

mean value: 0.7625646979424463

key: train_mcc
value: [0.75941547 0.75148224 0.759525   0.75544282 0.77167747 0.77186893
 0.77564465 0.77588525 0.78749923 0.76800824]

mean value: 0.7676449294755058

key: test_accuracy
value: [0.94736842 0.9122807  0.92982456 0.85964912 0.85714286 0.875
 0.82142857 0.875      0.82142857 0.91071429]

mean value: 0.8809837092731829

key: train_accuracy
value: [0.87968442 0.87573964 0.87968442 0.87771203 0.88582677 0.88582677
 0.88779528 0.88779528 0.89370079 0.88385827]

mean value: 0.8837623662426812

key: test_fscore
value: [0.94736842 0.9122807  0.93103448 0.86206897 0.86206897 0.87272727
 0.82142857 0.87272727 0.82758621 0.90909091]

mean value: 0.8818381769470699

key: train_fscore
value: [0.88062622 0.8762279  0.88062622 0.87698413 0.88627451 0.88715953
 0.88845401 0.88932039 0.89453125 0.88543689]

mean value: 0.8845641057179913

key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.86206897 0.83333333 0.88888889
 0.82142857 0.88888889 0.8        0.92592593]

mean value: 0.8779155263638022

key: train_precision
value: [0.87548638 0.8745098  0.87209302 0.88047809 0.8828125  0.87692308
 0.88326848 0.87739464 0.8875969  0.87356322]

mean value: 0.8784126109194028

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.86206897 0.89285714 0.85714286
 0.82142857 0.85714286 0.85714286 0.89285714]

mean value: 0.8864532019704433

key: train_recall
value: [0.88582677 0.87795276 0.88932806 0.87351779 0.88976378 0.8976378
 0.89370079 0.9015748  0.9015748  0.8976378 ]

mean value: 0.8908515141140954

key: test_roc_auc
value: [0.9476601  0.91256158 0.92980296 0.85960591 0.85714286 0.875
 0.82142857 0.875      0.82142857 0.91071429]

mean value: 0.8810344827586207

key: train_roc_auc
value: [0.87967228 0.87573527 0.8797034  0.87770378 0.88582677 0.88582677
 0.88779528 0.88779528 0.89370079 0.88385827]

mean value: 0.8837617876816781

key: test_jcc
value: [0.9        0.83870968 0.87096774 0.75757576 0.75757576 0.77419355
 0.6969697  0.77419355 0.70588235 0.83333333]

mean value: 0.7909401414524755

key: train_jcc
value: [0.78671329 0.77972028 0.78671329 0.78091873 0.79577465 0.7972028
 0.79929577 0.8006993  0.80918728 0.79442509]

mean value: 0.7930650467759314

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00749707 0.0071063  0.00800991 0.00797606 0.00807238 0.00814319
 0.00821209 0.00825953 0.0080924  0.00823736]

mean value: 0.007960629463195801

key: score_time
value: [0.01054406 0.01405478 0.01150608 0.0120914  0.0119555  0.01721978
 0.01335192 0.0119431  0.01190829 0.01170444]

mean value: 0.012627935409545899

key: test_mcc
value: [0.8953202  0.78940887 0.71921182 0.79110556 0.75047877 0.68250015
 0.60753044 0.75047877 0.58501794 0.82195294]

mean value: 0.7393005465274064

key: train_mcc
value: [0.78308641 0.78304441 0.77919572 0.79093074 0.79951627 0.78742599
 0.80317451 0.80759374 0.80324922 0.78395685]

mean value: 0.7921173847894009

key: test_accuracy
value: [0.94736842 0.89473684 0.85964912 0.89473684 0.875      0.83928571
 0.80357143 0.875      0.78571429 0.91071429]

mean value: 0.868577694235589

key: train_accuracy
value: [0.89151874 0.89151874 0.88954635 0.89546351 0.8996063  0.89370079
 0.9015748  0.90354331 0.9015748  0.89173228]

mean value: 0.8959779620742673

key: test_fscore
value: [0.94736842 0.89285714 0.86206897 0.9        0.87719298 0.83018868
 0.80701754 0.87272727 0.80645161 0.90909091]

mean value: 0.8704963529709496

key: train_fscore
value: [0.89236791 0.89151874 0.89019608 0.8950495  0.90097087 0.89411765
 0.90196078 0.90522244 0.90234375 0.89361702]

mean value: 0.8967364740693871

key: test_precision
value: [0.93103448 0.89285714 0.86206897 0.87096774 0.86206897 0.88
 0.79310345 0.88888889 0.73529412 0.92592593]

mean value: 0.8642209679323466

key: train_precision
value: [0.88715953 0.89328063 0.88326848 0.8968254  0.88888889 0.890625
 0.8984375  0.88973384 0.89534884 0.878327  ]

mean value: 0.8901895107400759

key: test_recall
value: [0.96428571 0.89285714 0.86206897 0.93103448 0.89285714 0.78571429
 0.82142857 0.85714286 0.89285714 0.89285714]

mean value: 0.8793103448275862

key: train_recall
value: [0.8976378  0.88976378 0.8972332  0.89328063 0.91338583 0.8976378
 0.90551181 0.92125984 0.90944882 0.90944882]

mean value: 0.9034608322181071

key: test_roc_auc
value: [0.9476601  0.89470443 0.85960591 0.89408867 0.875      0.83928571
 0.80357143 0.875      0.78571429 0.91071429]

mean value: 0.8685344827586207

key: train_roc_auc
value: [0.89150664 0.89152221 0.88956148 0.89545921 0.8996063  0.89370079
 0.9015748  0.90354331 0.9015748  0.89173228]

mean value: 0.8959781830630855

key: test_jcc
value: [0.9        0.80645161 0.75757576 0.81818182 0.78125    0.70967742
 0.67647059 0.77419355 0.67567568 0.83333333]

mean value: 0.7732809753647041

key: train_jcc
value: [0.80565371 0.80427046 0.80212014 0.81003584 0.81978799 0.80851064
 0.82142857 0.82685512 0.82206406 0.80769231]

mean value: 0.8128418840416354

MCC on Blind test: 0.25

Accuracy on Blind test: 0.72

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01613593 0.01766896 0.01700807 0.01466203 0.01586533 0.01819515
 0.01554966 0.01456881 0.01771259 0.01677132]

mean value: 0.016413784027099608

key: score_time
value: [0.00918293 0.01023507 0.00923419 0.00916266 0.01018572 0.01020145
 0.00912857 0.00951362 0.0102036  0.00925684]

mean value: 0.009630465507507324

key: test_mcc
value: [0.8953202  0.8953202  0.85960591 0.75462449 0.71611487 0.78772636
 0.64285714 0.75047877 0.64450339 0.78772636]

mean value: 0.7734277700975402

key: train_mcc
value: [0.77528914 0.77528914 0.77932046 0.78708603 0.79537422 0.78376226
 0.80337378 0.79163927 0.79926835 0.77574087]

mean value: 0.7866143511152437

key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.87719298 0.85714286 0.89285714
 0.82142857 0.875      0.82142857 0.89285714]

mean value: 0.8862468671679198

key: train_accuracy
value: [0.88757396 0.88757396 0.88954635 0.89349112 0.8976378  0.89173228
 0.9015748  0.89566929 0.8996063  0.88779528]

mean value: 0.8932201152370747

key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.88135593 0.86206897 0.88888889
 0.82142857 0.87272727 0.82758621 0.88888889]

mean value: 0.8868716051414689

key: train_fscore
value: [0.88888889 0.88888889 0.890625   0.89411765 0.8984375  0.89320388
 0.90272374 0.89708738 0.90019569 0.88888889]

mean value: 0.8943057505986216

key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.86666667 0.83333333 0.92307692
 0.82142857 0.88888889 0.8        0.92307692]

mean value: 0.8849574754747168

key: train_precision
value: [0.88030888 0.88030888 0.88030888 0.88715953 0.89147287 0.88122605
 0.89230769 0.88505747 0.89494163 0.88030888]

mean value: 0.8853400773979657

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
 0.82142857 0.85714286 0.85714286 0.85714286]

mean value: 0.8899014778325123

key: train_recall
value: [0.8976378  0.8976378  0.90118577 0.90118577 0.90551181 0.90551181
 0.91338583 0.90944882 0.90551181 0.8976378 ]

mean value: 0.9034655006068906

key: test_roc_auc
value: [0.9476601  0.9476601  0.92980296 0.87684729 0.85714286 0.89285714
 0.82142857 0.875      0.82142857 0.89285714]

mean value: 0.886268472906404

key: train_roc_auc
value: [0.88755408 0.88755408 0.88956926 0.89350627 0.8976378  0.89173228
 0.9015748  0.89566929 0.8996063  0.88779528]

mean value: 0.8932199433568828

key: test_jcc
value: [0.9        0.9        0.87096774 0.78787879 0.75757576 0.8
 0.6969697  0.77419355 0.70588235 0.8       ]

mean value: 0.7993467885687999

key: train_jcc
value: [0.8        0.8        0.8028169  0.80851064 0.81560284 0.80701754
 0.82269504 0.81338028 0.81850534 0.8       ]

mean value: 0.808852857567483

MCC on Blind test: 0.22

Accuracy on Blind test: 0.71

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.39372373 1.54294658 1.46989751 1.48421979 1.53672004 1.60921836
 1.45766068 1.5458262  1.44554925 1.52575564]

mean value: 1.5011517763137818

key: score_time
value: [0.01374149 0.01342797 0.01947975 0.01363492 0.01389122 0.02115655
 0.0138762  0.01382446 0.01419258 0.01371765]

mean value: 0.015094280242919922

key: test_mcc
value: [0.8951918  0.8953202  0.82490815 0.85960591 0.75047877 0.89802651
 0.85933785 0.78772636 0.78772636 0.85714286]

mean value: 0.8415464773043235

key: train_mcc
value: [0.98028353 0.96055211 0.97239383 0.96055211 0.97640822 0.97244848
 0.96463421 0.96850394 0.9645744  0.96853396]

mean value: 0.9688884814344612

key: test_accuracy
value: [0.94736842 0.94736842 0.9122807  0.92982456 0.875      0.94642857
 0.92857143 0.89285714 0.89285714 0.92857143]

mean value: 0.9201127819548872

key: train_accuracy
value: [0.99013807 0.98027613 0.98619329 0.98027613 0.98818898 0.98622047
 0.98228346 0.98425197 0.98228346 0.98425197]

mean value: 0.9844363944151951

key: test_fscore
value: [0.94545455 0.94736842 0.91525424 0.93103448 0.87719298 0.94339623
 0.93103448 0.88888889 0.89655172 0.92857143]

mean value: 0.9204747419782038

key: train_fscore
value: [0.99017682 0.98031496 0.98613861 0.98023715 0.98814229 0.98619329
 0.98217822 0.98425197 0.98224852 0.98418972]

mean value: 0.9844071562661963

key: test_precision
value: [0.96296296 0.93103448 0.9        0.93103448 0.86206897 1.
 0.9        0.92307692 0.86666667 0.92857143]

mean value: 0.9205415912312465

key: train_precision
value: [0.98823529 0.98031496 0.98809524 0.98023715 0.99206349 0.98814229
 0.98804781 0.98425197 0.98418972 0.98809524]

mean value: 0.9861673170230888

key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
 0.96428571 0.85714286 0.92857143 0.92857143]

mean value: 0.9219211822660098

key: train_recall
value: [0.99212598 0.98031496 0.98418972 0.98023715 0.98425197 0.98425197
 0.97637795 0.98425197 0.98031496 0.98031496]

mean value: 0.9826631601879805

key: test_roc_auc
value: [0.94704433 0.9476601  0.91194581 0.92980296 0.875      0.94642857
 0.92857143 0.89285714 0.89285714 0.92857143]

mean value: 0.9200738916256158

key: train_roc_auc
value: [0.99013414 0.98027606 0.98618935 0.98027606 0.98818898 0.98622047
 0.98228346 0.98425197 0.98228346 0.98425197]

mean value: 0.9844355917960848

key: test_jcc
value: [0.89655172 0.9        0.84375    0.87096774 0.78125    0.89285714
 0.87096774 0.8        0.8125     0.86666667]

mean value: 0.8535511017532709

key: train_jcc
value: [0.98054475 0.96138996 0.97265625 0.96124031 0.9765625  0.97276265
 0.96498054 0.96899225 0.96511628 0.9688716 ]

mean value: 0.9693117081673194

MCC on Blind test: 0.26

Accuracy on Blind test: 0.66

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01417804 0.01202106 0.01139545 0.01080203 0.01007557 0.01062059
 0.01066804 0.01060534 0.01126242 0.01187325]

mean value: 0.011350178718566894

key: score_time
value: [0.01092696 0.00883508 0.00887847 0.00816321 0.00810766 0.00819612
 0.00795102 0.00797558 0.00864434 0.00838518]

mean value: 0.008606362342834472

key: test_mcc
value: [0.93202124 0.8951918  0.85960591 0.8953202  0.75434227 0.96490128
 0.75434227 0.89342711 0.96490128 0.92857143]

mean value: 0.8842624793067261

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.94736842 0.92982456 0.94736842 0.875      0.98214286
 0.875      0.94642857 0.98214286 0.96428571]

mean value: 0.9414473684210526

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96296296 0.94545455 0.93103448 0.94736842 0.88135593 0.98181818
 0.88135593 0.94736842 0.98181818 0.96428571]

mean value: 0.942482277561025

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96296296 0.93103448 0.96428571 0.83870968 1.
 0.83870968 0.93103448 1.         0.96428571]

mean value: 0.9431022711890342

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.92857143 0.92857143 0.93103448 0.93103448 0.92857143 0.96428571
 0.92857143 0.96428571 0.96428571 0.96428571]

mean value: 0.9433497536945813

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96428571 0.94704433 0.92980296 0.9476601  0.875      0.98214286
 0.875      0.94642857 0.98214286 0.96428571]

mean value: 0.9413793103448276

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.92857143 0.89655172 0.87096774 0.9        0.78787879 0.96428571
 0.78787879 0.9        0.96428571 0.93103448]

mean value: 0.8931454381732469

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.36

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10496068 0.10379076 0.104743   0.10231304 0.1054554  0.10581684
 0.10448885 0.10502958 0.10399008 0.10775542]

mean value: 0.10483436584472657

key: score_time
value: [0.01817036 0.01749301 0.01778865 0.01884627 0.01766968 0.01870561
 0.01813245 0.01771808 0.01825023 0.01763487]

mean value: 0.018040919303894044

key: test_mcc
value: [0.8953202  0.86189955 0.85960591 0.82490815 0.75434227 0.96490128
 0.82618439 0.82195294 0.68250015 0.92857143]

mean value: 0.8420186261363041

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.92982456 0.92982456 0.9122807  0.875      0.98214286
 0.91071429 0.91071429 0.83928571 0.96428571]

mean value: 0.9201441102756892

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94736842 0.93103448 0.93103448 0.91525424 0.88135593 0.98245614
 0.91525424 0.90909091 0.84745763 0.96428571]

mean value: 0.9224592184195679

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.93103448 0.9        0.93103448 0.9        0.83870968 0.96551724
 0.87096774 0.92592593 0.80645161 0.96428571]

mean value: 0.9033926879366256

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 1.
 0.96428571 0.89285714 0.89285714 0.96428571]

mean value: 0.9433497536945813

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9476601  0.93041872 0.92980296 0.91194581 0.875      0.98214286
 0.91071429 0.91071429 0.83928571 0.96428571]

mean value: 0.9201970443349754

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9        0.87096774 0.87096774 0.84375    0.78787879 0.96551724
 0.84375    0.83333333 0.73529412 0.93103448]

mean value: 0.8582493446868079

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.33

Accuracy on Blind test: 0.71

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.0083456  0.00800943 0.00787878 0.00753284 0.00806522 0.00796127
 0.00782919 0.00870728 0.00797272 0.00807309]

mean value: 0.008037543296813965

key: score_time
value: [0.00834203 0.00855613 0.00791216 0.00868464 0.00868344 0.00816584
 0.00837827 0.00838041 0.00823331 0.00819325]

mean value: 0.008352947235107423

key: test_mcc
value: [0.8951918  0.68850906 0.79110556 0.78940887 0.57142857 0.65814518
 0.4330127  0.85714286 0.78772636 0.64450339]

mean value: 0.7116174353174761

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94736842 0.84210526 0.89473684 0.89473684 0.78571429 0.82142857
 0.71428571 0.92857143 0.89285714 0.82142857]

mean value: 0.8543233082706767

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94545455 0.84745763 0.9        0.89655172 0.78571429 0.8
 0.73333333 0.92857143 0.88888889 0.81481481]

mean value: 0.8540786648033872

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96296296 0.80645161 0.87096774 0.89655172 0.78571429 0.90909091
 0.6875     0.92857143 0.92307692 0.84615385]

mean value: 0.8617041434546996

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.92857143 0.89285714 0.93103448 0.89655172 0.78571429 0.71428571
 0.78571429 0.92857143 0.85714286 0.78571429]

mean value: 0.850615763546798

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94704433 0.8429803  0.89408867 0.89470443 0.78571429 0.82142857
 0.71428571 0.92857143 0.89285714 0.82142857]

mean value: 0.8543103448275862

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.89655172 0.73529412 0.81818182 0.8125     0.64705882 0.66666667
 0.57894737 0.86666667 0.8        0.6875    ]

mean value: 0.7509367185250606

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.71

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.3327291  1.3053844  1.30537295 1.2954855  1.29172778 1.30082202
 1.30300689 1.31275725 1.33773541 1.36065793]

mean value: 1.3145679235458374

key: score_time
value: [0.09119868 0.0915029  0.14295626 0.09044981 0.09054136 0.09039283
 0.09088302 0.09034443 0.09237862 0.09947395]

mean value: 0.09701218605041503

key: test_mcc
value: [0.96547546 0.8953202  0.92980296 0.8951918  0.85933785 1.
 0.92857143 0.89342711 0.93094934 0.92857143]

mean value: 0.922664756643307

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.92857143 1.
 0.96428571 0.94642857 0.96428571 0.96428571]

mean value: 0.9609962406015038

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.94915254 0.93103448 1.
 0.96428571 0.94736842 0.96296296 0.96428571]

mean value: 0.961379368196865

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.93103448 0.96551724 0.93333333 0.9        1.
 0.96428571 0.93103448 1.         0.96428571]

mean value: 0.9589490968801314

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.96428571 1.
 0.96428571 0.96428571 0.92857143 0.96428571]

mean value: 0.9645320197044335

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98214286 0.9476601  0.96490148 0.94704433 0.92857143 1.
 0.96428571 0.94642857 0.96428571 0.96428571]

mean value: 0.960960591133005

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.9        0.93333333 0.90322581 0.87096774 1.
 0.93103448 0.9        0.92857143 0.93103448]

mean value: 0.9262452990094814

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.19

Accuracy on Blind test: 0.48

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.90509057 0.91448712 0.93358636 0.96710658 0.93032479 0.91889691
 0.90740323 0.95062542 0.90149426 0.91975093]

mean value: 0.924876618385315

key: score_time
value: [0.17131925 0.23245525 0.21618485 0.23573542 0.27564526 0.17861819
 0.19408727 0.25907493 0.20869422 0.24864411]

mean value: 0.2220458745956421

key: test_mcc
value: [0.96547546 0.8953202  0.92980296 0.8951918  0.85714286 1.
 0.92857143 0.89342711 0.93094934 0.92857143]

mean value: 0.9224452574728608

key: train_mcc
value: [0.94890036 0.95277969 0.94878539 0.95278262 0.95687833 0.94112724
 0.94499908 0.95278544 0.94101052 0.94900279]

mean value: 0.9489051458683196

key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.92857143 1.
 0.96428571 0.94642857 0.96428571 0.96428571]

mean value: 0.9609962406015038

key: train_accuracy
value: [0.97435897 0.97633136 0.97435897 0.97633136 0.97834646 0.97047244
 0.97244094 0.97637795 0.97047244 0.97440945]

mean value: 0.974390035565081

key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.94915254 0.92857143 1.
 0.96428571 0.94736842 0.96296296 0.96428571]

mean value: 0.9611330627781457

key: train_fscore
value: [0.97465887 0.9765625  0.97445972 0.97647059 0.9785575  0.97076023
 0.97265625 0.97647059 0.97064579 0.97465887]

mean value: 0.9745900921567919

key: test_precision
value: [1.         0.93103448 0.96551724 0.93333333 0.92857143 1.
 0.96428571 0.93103448 1.         0.96428571]

mean value: 0.9618062397372742

key: train_precision
value: [0.96525097 0.96899225 0.96875    0.9688716  0.96911197 0.96138996
 0.96511628 0.97265625 0.96498054 0.96525097]

mean value: 0.9670370778213465

key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.92857143 1.
 0.96428571 0.96428571 0.92857143 0.96428571]

mean value: 0.9609605911330049

key: train_recall
value: [0.98425197 0.98425197 0.98023715 0.98418972 0.98818898 0.98031496
 0.98031496 0.98031496 0.97637795 0.98425197]

mean value: 0.9822694594005789

key: test_roc_auc
value: [0.98214286 0.9476601  0.96490148 0.94704433 0.92857143 1.
 0.96428571 0.94642857 0.96428571 0.96428571]

mean value: 0.960960591133005

key: train_roc_auc
value: [0.97433942 0.97631571 0.97437055 0.97634683 0.97834646 0.97047244
 0.97244094 0.97637795 0.97047244 0.97440945]

mean value: 0.9743892191341695

key: test_jcc
value: [0.96428571 0.9        0.93333333 0.90322581 0.86666667 1.
 0.93103448 0.9        0.92857143 0.93103448]

mean value: 0.9258151914825997

key: train_jcc
value: [0.95057034 0.95419847 0.95019157 0.95402299 0.95801527 0.94318182
 0.94676806 0.95402299 0.94296578 0.95057034]

mean value: 0.9504507631247383

MCC on Blind test: 0.21

Accuracy on Blind test: 0.52

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01842809 0.00753212 0.00757003 0.00761199 0.0075736  0.00752425
 0.00748992 0.00754261 0.00776935 0.00763273]

mean value: 0.008667469024658203

key: score_time
value: [0.01342535 0.00787449 0.00798821 0.007864   0.00845551 0.00779438
 0.00783849 0.00777411 0.00842023 0.00789094]

mean value: 0.008532571792602538

key: test_mcc
value: [0.8953202  0.82512315 0.85960591 0.71921182 0.71611487 0.75047877
 0.64285714 0.75047877 0.64450339 0.82195294]

mean value: 0.7625646979424463

key: train_mcc
value: [0.75941547 0.75148224 0.759525   0.75544282 0.77167747 0.77186893
 0.77564465 0.77588525 0.78749923 0.76800824]

mean value: 0.7676449294755058

key: test_accuracy
value: [0.94736842 0.9122807  0.92982456 0.85964912 0.85714286 0.875
 0.82142857 0.875      0.82142857 0.91071429]

mean value: 0.8809837092731829

key: train_accuracy
value: [0.87968442 0.87573964 0.87968442 0.87771203 0.88582677 0.88582677
 0.88779528 0.88779528 0.89370079 0.88385827]

mean value: 0.8837623662426812

key: test_fscore
value: [0.94736842 0.9122807  0.93103448 0.86206897 0.86206897 0.87272727
 0.82142857 0.87272727 0.82758621 0.90909091]

mean value: 0.8818381769470699

key: train_fscore
value: [0.88062622 0.8762279  0.88062622 0.87698413 0.88627451 0.88715953
 0.88845401 0.88932039 0.89453125 0.88543689]

mean value: 0.8845641057179913

key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.86206897 0.83333333 0.88888889
 0.82142857 0.88888889 0.8        0.92592593]

mean value: 0.8779155263638022

key: train_precision
value: [0.87548638 0.8745098  0.87209302 0.88047809 0.8828125  0.87692308
 0.88326848 0.87739464 0.8875969  0.87356322]

mean value: 0.8784126109194028

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.86206897 0.89285714 0.85714286
 0.82142857 0.85714286 0.85714286 0.89285714]

mean value: 0.8864532019704433

key: train_recall
value: [0.88582677 0.87795276 0.88932806 0.87351779 0.88976378 0.8976378
 0.89370079 0.9015748  0.9015748  0.8976378 ]

mean value: 0.8908515141140954

key: test_roc_auc
value: [0.9476601  0.91256158 0.92980296 0.85960591 0.85714286 0.875
 0.82142857 0.875      0.82142857 0.91071429]

mean value: 0.8810344827586207

key: train_roc_auc
value: [0.87967228 0.87573527 0.8797034  0.87770378 0.88582677 0.88582677
 0.88779528 0.88779528 0.89370079 0.88385827]

mean value: 0.8837617876816781

key: test_jcc
value: [0.9        0.83870968 0.87096774 0.75757576 0.75757576 0.77419355
 0.6969697  0.77419355 0.70588235 0.83333333]

mean value: 0.7909401414524755

key: train_jcc
value: [0.78671329 0.77972028 0.78671329 0.78091873 0.79577465 0.7972028
 0.79929577 0.8006993  0.80918728 0.79442509]

mean value: 0.7930650467759314

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.06567097 0.04970622 0.05028796 0.05296206 0.05908322 0.0577023
 0.05627537 0.05472136 0.06450558 0.06165719]

mean value: 0.057257223129272464

key: score_time
value: [0.00984359 0.00965667 0.00961947 0.01044655 0.01020241 0.01003504
 0.01031113 0.00977564 0.01015902 0.00963831]

mean value: 0.009968781471252441

key: test_mcc
value: [0.96547546 0.8951918  0.92980296 0.8951918  0.89342711 1.
 0.96490128 0.89342711 0.96490128 0.92857143]

mean value: 0.9330890233388842

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.94642857 1.
 0.98214286 0.94642857 0.98214286 0.96428571]

mean value: 0.9663533834586466

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.94545455 0.96551724 0.94915254 0.94736842 1.
 0.98245614 0.94736842 0.98181818 0.96428571]

mean value: 0.9665239389584955

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96296296 0.96551724 0.93333333 0.93103448 1.
 0.96551724 0.93103448 1.         0.96428571]

mean value: 0.9653685458857872

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.92857143 0.96551724 0.96551724 0.96428571 1.
 1.         0.96428571 0.96428571 0.96428571]

mean value: 0.9681034482758621

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98214286 0.94704433 0.96490148 0.94704433 0.94642857 1.
 0.98214286 0.94642857 0.98214286 0.96428571]

mean value: 0.966256157635468

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.89655172 0.93333333 0.90322581 0.9        1.
 0.96551724 0.9        0.96428571 0.93103448]

mean value: 0.9358234016632236

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.37

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01330185 0.04045773 0.04038882 0.04065561 0.04066443 0.04291534
 0.04127479 0.04550576 0.04016542 0.04041195]

mean value: 0.03857417106628418

key: score_time
value: [0.01009989 0.01929498 0.01896739 0.01059246 0.01052094 0.01064014
 0.02125072 0.01962495 0.01934791 0.01900911]

mean value: 0.015934848785400392

key: test_mcc
value: [0.85960591 0.8953202  0.85960591 0.82490815 0.75434227 0.82195294
 0.71611487 0.71611487 0.64450339 0.85933785]

mean value: 0.7951806363828539

key: train_mcc
value: [0.87014673 0.87036164 0.85437653 0.85842397 0.8746939  0.85134433
 0.83910959 0.85465533 0.87089581 0.85513299]

mean value: 0.8599140831820147

key: test_accuracy
value: [0.92982456 0.94736842 0.92982456 0.9122807  0.875      0.91071429
 0.85714286 0.85714286 0.82142857 0.92857143]

mean value: 0.8969298245614035

key: train_accuracy
value: [0.93491124 0.93491124 0.9270217  0.92899408 0.93700787 0.92519685
 0.91929134 0.92716535 0.93503937 0.92716535]

mean value: 0.9296704406032086

key: test_fscore
value: [0.92857143 0.94736842 0.93103448 0.91525424 0.88135593 0.90909091
 0.86206897 0.85185185 0.82758621 0.92592593]

mean value: 0.8980108361156687

key: train_fscore
value: [0.93592233 0.93617021 0.92787524 0.92996109 0.93822394 0.92692308
 0.92069632 0.92815534 0.93641618 0.92870906]

mean value: 0.9309052796774193

key: test_precision
value: [0.92857143 0.93103448 0.93103448 0.9        0.83870968 0.92592593
 0.83333333 0.88461538 0.8        0.96153846]

mean value: 0.893476317692113

key: train_precision
value: [0.92337165 0.92015209 0.91538462 0.91570881 0.92045455 0.90601504
 0.90494297 0.91570881 0.91698113 0.90943396]

mean value: 0.914815362183764

key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
 0.89285714 0.82142857 0.85714286 0.89285714]

mean value: 0.904064039408867

key: train_recall
value: [0.9488189  0.95275591 0.94071146 0.94466403 0.95669291 0.9488189
 0.93700787 0.94094488 0.95669291 0.9488189 ]

mean value: 0.9475926675173508

key: test_roc_auc
value: [0.92980296 0.9476601  0.92980296 0.91194581 0.875      0.91071429
 0.85714286 0.85714286 0.82142857 0.92857143]

mean value: 0.8969211822660099

key: train_roc_auc
value: [0.93488376 0.93487598 0.92704864 0.92902493 0.93700787 0.92519685
 0.91929134 0.92716535 0.93503937 0.92716535]

mean value: 0.9296699449130124

key: test_jcc
value: [0.86666667 0.9        0.87096774 0.84375    0.78787879 0.83333333
 0.75757576 0.74193548 0.70588235 0.86206897]

mean value: 0.8170059089719415

key: train_jcc
value: [0.87956204 0.88       0.86545455 0.86909091 0.88363636 0.86379928
 0.85304659 0.86594203 0.88043478 0.86690647]

mean value: 0.8707873026527986

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01948237 0.00829005 0.0076189  0.00788856 0.00796366 0.00804019
 0.00793099 0.00874853 0.00803947 0.00794411]

mean value: 0.009194684028625489

key: score_time
value: [0.00859404 0.0083673  0.00849128 0.00827527 0.00781775 0.00821924
 0.00822783 0.00834608 0.00824499 0.00831842]

mean value: 0.0082902193069458

key: test_mcc
value: [0.8953202  0.82512315 0.85960591 0.78940887 0.71611487 0.75047877
 0.64285714 0.75047877 0.64450339 0.82195294]

mean value: 0.7695844023759438

key: train_mcc
value: [0.75941547 0.76333276 0.76341509 0.77515483 0.77955173 0.77186893
 0.78749923 0.77962424 0.78742599 0.76786532]

mean value: 0.7735153587774711

key: test_accuracy
value: [0.94736842 0.9122807  0.92982456 0.89473684 0.85714286 0.875
 0.82142857 0.875      0.82142857 0.91071429]

mean value: 0.8844924812030075

key: train_accuracy
value: [0.87968442 0.8816568  0.8816568  0.88757396 0.88976378 0.88582677
 0.89370079 0.88976378 0.89370079 0.88385827]

mean value: 0.88671861653388

key: test_fscore
value: [0.94736842 0.9122807  0.93103448 0.89655172 0.86206897 0.87272727
 0.82142857 0.87272727 0.82758621 0.90909091]

mean value: 0.8852864528091389

key: train_fscore
value: [0.88062622 0.88235294 0.88235294 0.88757396 0.89019608 0.88715953
 0.89453125 0.890625   0.89411765 0.88499025]

mean value: 0.8874525831917391

key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.89655172 0.83333333 0.88888889
 0.82142857 0.88888889 0.8        0.92592593]

mean value: 0.8813638022258712

key: train_precision
value: [0.87548638 0.87890625 0.87548638 0.88582677 0.88671875 0.87692308
 0.8875969  0.88372093 0.890625   0.87644788]

mean value: 0.8817738317127776

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.89655172 0.89285714 0.85714286
 0.82142857 0.85714286 0.85714286 0.89285714]

mean value: 0.8899014778325123

key: train_recall
value: [0.88582677 0.88582677 0.88932806 0.88932806 0.89370079 0.8976378
 0.9015748  0.8976378  0.8976378  0.89370079]

mean value: 0.8932199433568827

key: test_roc_auc
value: [0.9476601  0.91256158 0.92980296 0.89470443 0.85714286 0.875
 0.82142857 0.875      0.82142857 0.91071429]

mean value: 0.8845443349753694

key: train_roc_auc
value: [0.87967228 0.88164856 0.88167191 0.88757742 0.88976378 0.88582677
 0.89370079 0.88976378 0.89370079 0.88385827]

mean value: 0.8867184339111761

key: test_jcc
value: [0.9        0.83870968 0.87096774 0.8125     0.75757576 0.77419355
 0.6969697  0.77419355 0.70588235 0.83333333]

mean value: 0.7964325656948996

key: train_jcc
value: [0.78671329 0.78947368 0.78947368 0.79787234 0.80212014 0.7972028
 0.80918728 0.8028169  0.80851064 0.79370629]

mean value: 0.7977077046669985

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00967669 0.01218224 0.01342869 0.01291203 0.01177907 0.01280165
 0.01166844 0.01290774 0.01253223 0.01339507]

mean value: 0.012328386306762695

key: score_time
value: [0.00771666 0.00980401 0.00986791 0.01040888 0.01046586 0.01049089
 0.010355   0.01109457 0.01037741 0.01041889]

mean value: 0.010100007057189941

key: test_mcc
value: [0.93202124 0.8953202  0.82942474 0.86189955 0.75047877 1.
 0.79385662 0.78571429 0.75047877 0.78571429]

mean value: 0.8384908463006829

key: train_mcc
value: [0.90172947 0.91347458 0.90633247 0.84245181 0.90979438 0.87444958
 0.84046723 0.88616336 0.8819171  0.87366794]

mean value: 0.8830447927155709

key: test_accuracy
value: [0.96491228 0.94736842 0.9122807  0.92982456 0.875      1.
 0.89285714 0.89285714 0.875      0.89285714]

mean value: 0.9182957393483709

key: train_accuracy
value: [0.95069034 0.9566075  0.95266272 0.92110454 0.95472441 0.93700787
 0.91929134 0.94291339 0.94094488 0.93503937]

mean value: 0.9410986348599916

key: test_fscore
value: [0.96296296 0.94736842 0.90909091 0.92857143 0.87272727 1.
 0.9        0.89285714 0.87719298 0.89285714]

mean value: 0.9183628262575632

key: train_fscore
value: [0.9500998  0.9561753  0.951417   0.921875   0.95409182 0.936
 0.92190476 0.94368932 0.94117647 0.93785311]

mean value: 0.941428257984581

key: test_precision
value: [1.         0.93103448 0.96153846 0.96296296 0.88888889 1.
 0.84375    0.89285714 0.86206897 0.89285714]

mean value: 0.9235958047380461

key: train_precision
value: [0.96356275 0.96774194 0.97510373 0.91119691 0.96761134 0.95121951
 0.89298893 0.93103448 0.9375     0.89891697]

mean value: 0.9396876562541508

key: test_recall
value: [0.92857143 0.96428571 0.86206897 0.89655172 0.85714286 1.
 0.96428571 0.89285714 0.89285714 0.89285714]

mean value: 0.9151477832512316

key: train_recall
value: [0.93700787 0.94488189 0.92885375 0.93280632 0.94094488 0.92125984
 0.95275591 0.95669291 0.94488189 0.98031496]

mean value: 0.9440400236531699

key: test_roc_auc
value: [0.96428571 0.9476601  0.91317734 0.93041872 0.875      1.
 0.89285714 0.89285714 0.875      0.89285714]

mean value: 0.9184113300492611

key: train_roc_auc
value: [0.95071738 0.95663067 0.95261585 0.92112757 0.95472441 0.93700787
 0.91929134 0.94291339 0.94094488 0.93503937]

mean value: 0.9411012729140082

key: test_jcc
value: [0.92857143 0.9        0.83333333 0.86666667 0.77419355 1.
 0.81818182 0.80645161 0.78125    0.80645161]

mean value: 0.8515100020946795

key: train_jcc
value: [0.90494297 0.91603053 0.90733591 0.85507246 0.91221374 0.87969925
 0.85512367 0.89338235 0.88888889 0.88297872]

mean value: 0.8895668499958933

MCC on Blind test: 0.19

Accuracy on Blind test: 0.58

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01474547 0.01279259 0.01564717 0.01446366 0.01306367 0.01507711
 0.01365781 0.01378751 0.01366973 0.01418114]

mean value: 0.014108586311340331

key: score_time
value: [0.01045132 0.01075387 0.0109086  0.01076341 0.01093888 0.01090598
 0.01117682 0.01138139 0.01105213 0.01108289]

mean value: 0.010941529273986816

key: test_mcc
value: [0.8951918  0.92980296 0.8951918  0.8953202  0.64951905 0.8660254
 0.70082556 0.82195294 0.89342711 0.92857143]

mean value: 0.8475828256723696

key: train_mcc
value: [0.91324443 0.8974355  0.9215681  0.93352251 0.878014   0.86150531
 0.84768598 0.89200643 0.92554839 0.91732994]

mean value: 0.8987860594733551

key: test_accuracy
value: [0.94736842 0.96491228 0.94736842 0.94736842 0.82142857 0.92857143
 0.83928571 0.91071429 0.94642857 0.96428571]

mean value: 0.9217731829573934

key: train_accuracy
value: [0.9566075  0.94871795 0.96055227 0.96646943 0.93897638 0.92913386
 0.92125984 0.94488189 0.96259843 0.95866142]

mean value: 0.948785895106307

key: test_fscore
value: [0.94545455 0.96428571 0.94915254 0.94736842 0.83333333 0.93333333
 0.85714286 0.9122807  0.94545455 0.96428571]

mean value: 0.9252091708469943

key: train_fscore
value: [0.95652174 0.9488189  0.96108949 0.96579477 0.93933464 0.93207547
 0.92537313 0.94676806 0.96207585 0.95857988]

mean value: 0.9496431934331271

key: test_precision
value: [0.96296296 0.96428571 0.93333333 0.96428571 0.78125    0.875
 0.77142857 0.89655172 0.96296296 0.96428571]

mean value: 0.9076346697682904

key: train_precision
value: [0.96031746 0.9488189  0.94636015 0.98360656 0.93385214 0.89492754
 0.87943262 0.91544118 0.9757085  0.96047431]

mean value: 0.9398939355807465

key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.89285714 1.
 0.96428571 0.92857143 0.92857143 0.96428571]

mean value: 0.9467980295566503

key: train_recall
value: [0.95275591 0.9488189  0.97628458 0.9486166  0.94488189 0.97244094
 0.97637795 0.98031496 0.9488189  0.95669291]

mean value: 0.9606003547975476

key: test_roc_auc
value: [0.94704433 0.96490148 0.94704433 0.9476601  0.82142857 0.92857143
 0.83928571 0.91071429 0.94642857 0.96428571]

mean value: 0.9217364532019705

key: train_roc_auc
value: [0.95661511 0.94871775 0.96058324 0.96643428 0.93897638 0.92913386
 0.92125984 0.94488189 0.96259843 0.95866142]

mean value: 0.9487862189163113

key: test_jcc
value: [0.89655172 0.93103448 0.90322581 0.9        0.71428571 0.875
 0.75       0.83870968 0.89655172 0.93103448]

mean value: 0.8636393611949785

key: train_jcc
value: [0.91666667 0.90262172 0.92509363 0.93385214 0.88560886 0.87279152
 0.86111111 0.89891697 0.92692308 0.92045455]

mean value: 0.904404023907068

MCC on Blind test: 0.32

Accuracy on Blind test: 0.78

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.1140337  0.10048437 0.10199213 0.10053515 0.10115218 0.10382915
 0.10217881 0.09598231 0.0961473  0.09758639]

mean value: 0.10139214992523193

key: score_time
value: [0.01537442 0.01496673 0.01581311 0.01557422 0.01550794 0.01505017
 0.01430726 0.01472378 0.01485848 0.01430631]

mean value: 0.01504824161529541

key: test_mcc
value: [0.96547546 0.92980296 0.92980296 0.93202124 0.82618439 1.
 0.96490128 0.93094934 0.92857143 0.85933785]

mean value: 0.9267046893623845

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98245614 0.96491228 0.96491228 0.96491228 0.91071429 1.
 0.98214286 0.96428571 0.96428571 0.92857143]

mean value: 0.9627192982456141

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.96428571 0.96551724 0.96666667 0.91525424 1.
 0.98245614 0.96551724 0.96428571 0.93103448]

mean value: 0.9636835620212532

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96428571 0.96551724 0.93548387 0.87096774 1.
 0.96551724 0.93333333 0.96428571 0.9       ]

mean value: 0.9499390857566609

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.96551724 1.         0.96428571 1.
 1.         1.         0.96428571 0.96428571]

mean value: 0.9786945812807882

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98214286 0.96490148 0.96490148 0.96428571 0.91071429 1.
 0.98214286 0.96428571 0.96428571 0.92857143]

mean value: 0.9626231527093597

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.93103448 0.93333333 0.93548387 0.84375    1.
 0.96551724 0.93333333 0.93103448 0.87096774]

mean value: 0.9308740200752158

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.39

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03926826 0.03829074 0.04713559 0.05223989 0.04820871 0.04241133
 0.03772259 0.03749561 0.03793573 0.04857802]

mean value: 0.042928647994995114

key: score_time
value: [0.0239563  0.02649641 0.02141261 0.03752351 0.02890968 0.03851295
 0.02284622 0.0233736  0.02389741 0.0230751 ]

mean value: 0.02700037956237793

key: test_mcc
value: [0.96547546 0.92980296 0.8953202  0.93202124 0.82618439 1.
 0.96490128 0.89342711 0.93094934 0.92857143]

mean value: 0.9266653398520664

key: train_mcc
value: [0.99214142 0.99211042 0.99214118 1.         0.98819663 0.98428248
 0.98825791 1.         0.99212598 0.98819663]

mean value: 0.991745267193298

key: test_accuracy
value: [0.98245614 0.96491228 0.94736842 0.96491228 0.91071429 1.
 0.98214286 0.94642857 0.96428571 0.96428571]

mean value: 0.9627506265664161

key: train_accuracy
value: [0.99605523 0.99605523 0.99605523 1.         0.99409449 0.99212598
 0.99409449 1.         0.99606299 0.99409449]

mean value: 0.9958638121418255

key: test_fscore
value: [0.98181818 0.96428571 0.94736842 0.96666667 0.91525424 1.
 0.98245614 0.94736842 0.96296296 0.96428571]

mean value: 0.9632466459763516

key: train_fscore
value: [0.99604743 0.99606299 0.99603175 1.         0.99408284 0.99209486
 0.99405941 1.         0.99606299 0.99408284]

mean value: 0.99585251091878

key: test_precision
value: [1.         0.96428571 0.96428571 0.93548387 0.87096774 1.
 0.96551724 0.93103448 1.         0.96428571]

mean value: 0.9595860479898299

key: train_precision
value: [1.         0.99606299 1.         1.         0.99604743 0.99603175
 1.         1.         0.99606299 0.99604743]

mean value: 0.9980252591943793

key: test_recall
value: [0.96428571 0.96428571 0.93103448 1.         0.96428571 1.
 1.         0.96428571 0.92857143 0.96428571]

mean value: 0.968103448275862

key: train_recall
value: [0.99212598 0.99606299 0.99209486 1.         0.99212598 0.98818898
 0.98818898 1.         0.99606299 0.99212598]

mean value: 0.9936976751423858

key: test_roc_auc
value: [0.98214286 0.96490148 0.9476601  0.96428571 0.91071429 1.
 0.98214286 0.94642857 0.96428571 0.96428571]

mean value: 0.9626847290640395

key: train_roc_auc
value: [0.99606299 0.99605521 0.99604743 1.         0.99409449 0.99212598
 0.99409449 1.         0.99606299 0.99409449]

mean value: 0.9958638075378917

key: test_jcc
value: [0.96428571 0.93103448 0.9        0.93548387 0.84375    1.
 0.96551724 0.9        0.92857143 0.93103448]

mean value: 0.9299677220721436

key: train_jcc
value: [0.99212598 0.99215686 0.99209486 1.         0.98823529 0.98431373
 0.98818898 1.         0.99215686 0.98823529]

mean value: 0.9917507861505687

MCC on Blind test: 0.14

Accuracy on Blind test: 0.37

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.16959524 0.2451508  0.17405295 0.16930294 0.16698885 0.11223626
 0.10615253 0.17977715 0.10680389 0.14715362]

mean value: 0.1577214241027832

key: score_time
value: [0.0270555  0.02043033 0.02078581 0.0202899  0.02019739 0.01269197
 0.01305366 0.02091789 0.01312304 0.02039123]

mean value: 0.018893671035766602

key: test_mcc
value: [0.8953202  0.86189955 0.82512315 0.82490815 0.75047877 0.78571429
 0.64450339 0.75047877 0.64951905 0.85714286]

mean value: 0.7845088175007775

key: train_mcc
value: [0.85051239 0.85019923 0.84231823 0.8428767  0.85465533 0.84293789
 0.84677832 0.85513299 0.87062545 0.84677832]

mean value: 0.8502814833818734

key: test_accuracy
value: [0.94736842 0.92982456 0.9122807  0.9122807  0.875      0.89285714
 0.82142857 0.875      0.82142857 0.92857143]

mean value: 0.8916040100250626

key: train_accuracy
value: [0.92504931 0.92504931 0.92110454 0.92110454 0.92716535 0.92125984
 0.92322835 0.92716535 0.93503937 0.92322835]

mean value: 0.924939430648092

key: test_fscore
value: [0.94736842 0.93103448 0.9122807  0.91525424 0.87719298 0.89285714
 0.82758621 0.87272727 0.83333333 0.92857143]

mean value: 0.8938206209695644

key: train_fscore
value: [0.92635659 0.92578125 0.92156863 0.92248062 0.92815534 0.92248062
 0.92427184 0.92870906 0.93617021 0.92427184]

mean value: 0.9260246004677202

key: test_precision
value: [0.93103448 0.9        0.92857143 0.9        0.86206897 0.89285714
 0.8        0.88888889 0.78125    0.92857143]

mean value: 0.8813242337164751

key: train_precision
value: [0.91221374 0.91860465 0.91439689 0.90494297 0.91570881 0.90839695
 0.91187739 0.90943396 0.92015209 0.91187739]

mean value: 0.9127604846176163

key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.89285714
 0.85714286 0.85714286 0.89285714 0.92857143]

mean value: 0.9077586206896552

key: train_recall
value: [0.94094488 0.93307087 0.92885375 0.94071146 0.94094488 0.93700787
 0.93700787 0.9488189  0.95275591 0.93700787]

mean value: 0.9397124272509414

key: test_roc_auc
value: [0.9476601  0.93041872 0.91256158 0.91194581 0.875      0.89285714
 0.82142857 0.875      0.82142857 0.92857143]

mean value: 0.8916871921182267

key: train_roc_auc
value: [0.9250179  0.92503346 0.92111979 0.92114313 0.92716535 0.92125984
 0.92322835 0.92716535 0.93503937 0.92322835]

mean value: 0.9249400890106128

key: test_jcc
value: [0.9        0.87096774 0.83870968 0.84375    0.78125    0.80645161
 0.70588235 0.77419355 0.71428571 0.86666667]

mean value: 0.8102157314538718

key: train_jcc
value: [0.86281588 0.86181818 0.85454545 0.85611511 0.86594203 0.85611511
 0.85920578 0.86690647 0.88       0.85920578]

mean value: 0.862266979281973

MCC on Blind test: 0.29

Accuracy on Blind test: 0.72

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.26614237 0.2468183  0.2485559  0.24616241 0.24786353 0.24828482
 0.25354338 0.26733065 0.25087976 0.24782729]

mean value: 0.25234084129333495

key: score_time
value: [0.00884628 0.00858235 0.00861478 0.00854993 0.0087781  0.00885868
 0.00969672 0.00946689 0.0085597  0.00855279]

mean value: 0.008850622177124023

key: test_mcc
value: [0.96547546 0.92980296 0.96547546 0.93202124 0.82195294 1.
 0.96490128 0.89342711 0.96490128 0.92857143]

mean value: 0.9366529157151744

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98245614 0.96491228 0.98245614 0.96491228 0.91071429 1.
 0.98214286 0.94642857 0.98214286 0.96428571]

mean value: 0.9680451127819548

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98181818 0.96428571 0.98305085 0.96666667 0.9122807  1.
 0.98245614 0.94736842 0.98181818 0.96428571]

mean value: 0.9684030569489981

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96428571 0.96666667 0.93548387 0.89655172 1.
 0.96551724 0.93103448 1.         0.96428571]

mean value: 0.9623825414481699

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 1.         1.         0.92857143 1.
 1.         0.96428571 0.96428571 0.96428571]

mean value: 0.975

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98214286 0.96490148 0.98214286 0.96428571 0.91071429 1.
 0.98214286 0.94642857 0.98214286 0.96428571]

mean value: 0.9679187192118227

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96428571 0.93103448 0.96666667 0.93548387 0.83870968 1.
 0.96551724 0.9        0.96428571 0.93103448]

mean value: 0.9397017850521744

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.3

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01340175 0.01410508 0.01441169 0.01409388 0.02937245 0.01492167
 0.0153625  0.01417089 0.01423931 0.01514316]

mean value: 0.015922236442565917

key: score_time
value: [0.01146483 0.0109961  0.01089525 0.01093698 0.01172209 0.01099157
 0.01097131 0.01087904 0.01158166 0.01099563]

mean value: 0.01114344596862793

key: test_mcc
value: [0.76550573 0.75462449 0.79161589 0.68850906 0.50518149 0.47187011
 0.68250015 0.67900461 0.79385662 0.67900461]

mean value: 0.6811672742900674

key: train_mcc
value: [0.79484005 0.76863111 0.78816439 0.79111205 0.71433965 0.76123378
 0.77349899 0.80474782 0.76277007 0.76987347]

mean value: 0.7729211371212047

key: test_accuracy
value: [0.87719298 0.87719298 0.89473684 0.84210526 0.75       0.73214286
 0.83928571 0.83928571 0.89285714 0.83928571]

mean value: 0.8384085213032582

key: train_accuracy
value: [0.89546351 0.88362919 0.89151874 0.89349112 0.84251969 0.87795276
 0.88385827 0.9015748  0.87795276 0.88385827]

mean value: 0.8831819099535635

key: test_fscore
value: [0.8627451  0.87272727 0.89285714 0.83636364 0.73076923 0.75409836
 0.83018868 0.83636364 0.88461538 0.83636364]

mean value: 0.8337092078000177

key: train_fscore
value: [0.89026915 0.88032454 0.88469602 0.8875     0.81651376 0.88475836
 0.87631027 0.89837398 0.86919831 0.8793456 ]

mean value: 0.8767290009085706

key: test_precision
value: [0.95652174 0.88888889 0.92592593 0.88461538 0.79166667 0.6969697
 0.88       0.85185185 0.95833333 0.85185185]

mean value: 0.8686625339234035

key: train_precision
value: [0.93886463 0.90794979 0.94196429 0.93832599 0.97802198 0.83802817
 0.93721973 0.92857143 0.93636364 0.91489362]

mean value: 0.9260203256453761

key: test_recall
value: [0.78571429 0.85714286 0.86206897 0.79310345 0.67857143 0.82142857
 0.78571429 0.82142857 0.82142857 0.82142857]

mean value: 0.8048029556650246

key: train_recall
value: [0.84645669 0.85433071 0.83399209 0.84189723 0.7007874  0.93700787
 0.82283465 0.87007874 0.81102362 0.84645669]

mean value: 0.8364865706015997

key: test_roc_auc
value: [0.87561576 0.87684729 0.8953202  0.8429803  0.75       0.73214286
 0.83928571 0.83928571 0.89285714 0.83928571]

mean value: 0.8383620689655172

key: train_roc_auc
value: [0.89556036 0.88368709 0.8914055  0.89338956 0.84251969 0.87795276
 0.88385827 0.9015748  0.87795276 0.88385827]

mean value: 0.8831759048893593

key: test_jcc
value: [0.75862069 0.77419355 0.80645161 0.71875    0.57575758 0.60526316
 0.70967742 0.71875    0.79310345 0.71875   ]

mean value: 0.7179317452228509

key: train_jcc
value: [0.80223881 0.78623188 0.79323308 0.79775281 0.68992248 0.79333333
 0.77985075 0.81549815 0.76865672 0.78467153]

mean value: 0.7811389546191971

MCC on Blind test: 0.3

Accuracy on Blind test: 0.71

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01172638 0.01136661 0.01153398 0.02457762 0.01147556 0.01133466
 0.0113709  0.02615333 0.0302875  0.0303762 ]

mean value: 0.018020272254943848

key: score_time
value: [0.01069403 0.01067114 0.01076221 0.01973557 0.01065874 0.01060033
 0.01062369 0.01272726 0.01075745 0.01385522]

mean value: 0.012108564376831055

key: test_mcc
value: [0.8953202  0.8953202  0.85960591 0.79110556 0.71611487 0.82195294
 0.67900461 0.71611487 0.68250015 0.82195294]

mean value: 0.7878992256362354

key: train_mcc
value: [0.83472439 0.83904026 0.81877755 0.82280791 0.83123063 0.8154727
 0.81142619 0.82718204 0.8431734  0.81527029]

mean value: 0.8259105358013283

key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.89473684 0.85714286 0.91071429
 0.83928571 0.85714286 0.83928571 0.91071429]

mean value: 0.8933583959899749

key: train_accuracy
value: [0.91715976 0.91913215 0.90927022 0.9112426  0.91535433 0.90748031
 0.90551181 0.91338583 0.92125984 0.90748031]

mean value: 0.9127277174672692

key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.9        0.86206897 0.90909091
 0.84210526 0.85185185 0.84745763 0.90909091]

mean value: 0.8947436850691334

key: train_fscore
value: [0.91860465 0.92100193 0.91015625 0.9122807  0.91682785 0.90909091
 0.90697674 0.91472868 0.92277992 0.90873786]

mean value: 0.9141185505002607

key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.87096774 0.83333333 0.92592593
 0.82758621 0.88461538 0.80645161 0.92592593]

mean value: 0.8867909579811694

key: train_precision
value: [0.90458015 0.90188679 0.8996139  0.9        0.90114068 0.89353612
 0.89312977 0.90076336 0.90530303 0.89655172]

mean value: 0.899650553503409

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
 0.85714286 0.82142857 0.89285714 0.89285714]

mean value: 0.904064039408867

key: train_recall
value: [0.93307087 0.94094488 0.92094862 0.92490119 0.93307087 0.92519685
 0.92125984 0.92913386 0.94094488 0.92125984]

mean value: 0.9290731692135321

key: test_roc_auc
value: [0.9476601  0.9476601  0.92980296 0.89408867 0.85714286 0.91071429
 0.83928571 0.85714286 0.83928571 0.91071429]

mean value: 0.8933497536945814

key: train_roc_auc
value: [0.91712832 0.91908904 0.90929321 0.91126949 0.91535433 0.90748031
 0.90551181 0.91338583 0.92125984 0.90748031]

mean value: 0.9127252497588

key: test_jcc
value: [0.9        0.9        0.87096774 0.81818182 0.75757576 0.83333333
 0.72727273 0.74193548 0.73529412 0.83333333]

mean value: 0.811789431315048

key: train_jcc
value: [0.84946237 0.85357143 0.83512545 0.83870968 0.84642857 0.83333333
 0.82978723 0.84285714 0.85663082 0.83274021]

mean value: 0.8418646239168347

MCC on Blind test: 0.25

Accuracy on Blind test: 0.71

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.11566854 0.13413811 0.27206016 0.20076489 0.19601989 0.19559526
 0.2167592  0.19660378 0.19614172 0.1994133 ]

mean value: 0.19231648445129396

key: score_time
value: [0.01090479 0.02036548 0.0200057  0.02095532 0.0202384  0.01987505
 0.0205245  0.01911926 0.01085591 0.01083922]

mean value: 0.017368364334106445

key: test_mcc
value: [0.85960591 0.8953202  0.85960591 0.82490815 0.75434227 0.82195294
 0.71611487 0.71611487 0.68250015 0.85933785]

mean value: 0.7989803124894794

key: train_mcc
value: [0.86225372 0.86654135 0.85053095 0.85053095 0.87062545 0.85513299
 0.83505996 0.85465533 0.86710997 0.8431734 ]

mean value: 0.8555614064171675

key: test_accuracy
value: [0.92982456 0.94736842 0.92982456 0.9122807  0.875      0.91071429
 0.85714286 0.85714286 0.83928571 0.92857143]

mean value: 0.8987155388471177

key: train_accuracy
value: [0.93096647 0.93293886 0.92504931 0.92504931 0.93503937 0.92716535
 0.91732283 0.92716535 0.93307087 0.92125984]

mean value: 0.927502756682042

key: test_fscore
value: [0.92857143 0.94736842 0.93103448 0.91525424 0.88135593 0.90909091
 0.86206897 0.85185185 0.84745763 0.92592593]

mean value: 0.8999979781378779

key: train_fscore
value: [0.93203883 0.93436293 0.92607004 0.92607004 0.93617021 0.92870906
 0.91860465 0.92815534 0.93461538 0.92277992]

mean value: 0.9287576414141969

key: test_precision
value: [0.92857143 0.93103448 0.93103448 0.9        0.83870968 0.92592593
 0.83333333 0.88461538 0.80645161 0.96153846]

mean value: 0.8941214789824357

key: train_precision
value: [0.91954023 0.91666667 0.91187739 0.91187739 0.92015209 0.90943396
 0.90458015 0.91570881 0.91353383 0.90530303]

mean value: 0.9128673569164447

key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
 0.89285714 0.82142857 0.89285714 0.89285714]

mean value: 0.9076354679802956

key: train_recall
value: [0.94488189 0.95275591 0.94071146 0.94071146 0.95275591 0.9488189
 0.93307087 0.94094488 0.95669291 0.94094488]

mean value: 0.9452289066633469

key: test_roc_auc
value: [0.92980296 0.9476601  0.92980296 0.91194581 0.875      0.91071429
 0.85714286 0.85714286 0.83928571 0.92857143]

mean value: 0.8987068965517242

key: train_roc_auc
value: [0.93093897 0.93289969 0.92508014 0.92508014 0.93503937 0.92716535
 0.91732283 0.92716535 0.93307087 0.92125984]

mean value: 0.927502256387912

key: test_jcc
value: [0.86666667 0.9        0.87096774 0.84375    0.78787879 0.83333333
 0.75757576 0.74193548 0.73529412 0.86206897]

mean value: 0.8199470854425297

key: train_jcc
value: [0.87272727 0.87681159 0.86231884 0.86231884 0.88       0.86690647
 0.84946237 0.86594203 0.87725632 0.85663082]

mean value: 0.8670374559548931

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02059269 0.0403645  0.0256114  0.04780722 0.05848503 0.02307177
 0.02302694 0.02297401 0.02418137 0.02429724]

mean value: 0.031041216850280762

key: score_time
value: [0.0107677  0.01078558 0.01102161 0.01087213 0.01087928 0.01069665
 0.01075411 0.01068592 0.0107131  0.01078057]

mean value: 0.010795664787292481

key: test_mcc
value: [0.63745526 0.78410665 0.60000053 0.89139151 0.78410665 0.89139151
 0.89153439 0.86334835 0.89139151 0.81854376]

mean value: 0.805327011589605

key: train_mcc
value: [0.83096715 0.83450632 0.8435716  0.82679606 0.83450632 0.8224719
 0.83074746 0.83041633 0.82643766 0.82660248]

mean value: 0.830702327712044

key: test_accuracy
value: [0.81818182 0.89090909 0.8        0.94545455 0.89090909 0.94545455
 0.94545455 0.92727273 0.94545455 0.90909091]

mean value: 0.9018181818181819

key: train_accuracy
value: [0.91515152 0.91717172 0.92121212 0.91313131 0.91717172 0.91111111
 0.91515152 0.91515152 0.91313131 0.91313131]

mean value: 0.9151515151515152

key: test_fscore
value: [0.80769231 0.89285714 0.79245283 0.94339623 0.89285714 0.94736842
 0.94545455 0.93333333 0.94736842 0.9122807 ]

mean value: 0.9015061072657895

key: train_fscore
value: [0.91699605 0.91816367 0.92337917 0.91485149 0.91816367 0.912
 0.91633466 0.91566265 0.91382766 0.91417166]

mean value: 0.9163550676695618

key: test_precision
value: [0.84       0.86206897 0.80769231 0.96153846 0.86206897 0.93103448
 0.96296296 0.875      0.93103448 0.89655172]

mean value: 0.8929952352883387

key: train_precision
value: [0.89922481 0.90909091 0.90038314 0.89883268 0.90909091 0.90118577
 0.90196078 0.90836653 0.9047619  0.9015748 ]

mean value: 0.903447224781149

key: test_recall
value: [0.77777778 0.92592593 0.77777778 0.92592593 0.92592593 0.96428571
 0.92857143 1.         0.96428571 0.92857143]

mean value: 0.9119047619047619

key: train_recall
value: [0.93548387 0.92741935 0.94758065 0.93145161 0.92741935 0.92307692
 0.93117409 0.92307692 0.92307692 0.92712551]

mean value: 0.9296885203082147

key: test_roc_auc
value: [0.81746032 0.89153439 0.79960317 0.94510582 0.89153439 0.94510582
 0.9457672  0.92592593 0.94510582 0.90873016]

mean value: 0.9015873015873016

key: train_roc_auc
value: [0.91511036 0.91715097 0.92115874 0.91309423 0.91715097 0.91113524
 0.91518382 0.91516749 0.91315136 0.91315953]

mean value: 0.9151462713856602

key: test_jcc
value: [0.67741935 0.80645161 0.65625    0.89285714 0.80645161 0.9
 0.89655172 0.875      0.9        0.83870968]

mean value: 0.8249691125059591

key: train_jcc
value: [0.84671533 0.84870849 0.85766423 0.84306569 0.84870849 0.83823529
 0.84558824 0.84444444 0.84132841 0.84191176]

mean value: 0.8456370381490419

MCC on Blind test: 0.28

Accuracy on Blind test: 0.7

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.64936924 0.66852713 0.79406118 0.67012405 0.69154596 0.8323493
 0.78064322 0.67432141 0.84689832 0.68367529]

mean value: 0.7291515111923218

key: score_time
value: [0.01181173 0.01206112 0.01192927 0.01092577 0.01228356 0.01233196
 0.01223803 0.01151776 0.0125103  0.01225424]

mean value: 0.011986374855041504

key: test_mcc
value: [0.82269299 0.92980214 0.63745526 0.85449735 0.89642146 0.96423926
 0.92980214 0.81878307 0.96423926 0.8565805 ]

mean value: 0.8674513428146936

key: train_mcc
value: [0.92727243 0.93132101 0.94355919 0.93535327 0.92730389 0.93131989
 0.92324017 0.94355551 0.94346399 0.93538276]

mean value: 0.9341772124772538

key: test_accuracy
value: [0.90909091 0.96363636 0.81818182 0.92727273 0.94545455 0.98181818
 0.96363636 0.90909091 0.98181818 0.92727273]

mean value: 0.9327272727272727

key: train_accuracy
value: [0.96363636 0.96565657 0.97171717 0.96767677 0.96363636 0.96565657
 0.96161616 0.97171717 0.97171717 0.96767677]

mean value: 0.9670707070707071

key: test_fscore
value: [0.90196078 0.96428571 0.80769231 0.92592593 0.94736842 0.98245614
 0.96296296 0.90909091 0.98245614 0.93103448]

mean value: 0.9315233788784553

key: train_fscore
value: [0.96370968 0.96565657 0.97154472 0.96774194 0.96356275 0.96551724
 0.96161616 0.97142857 0.97154472 0.96747967]

mean value: 0.9669802011711329

key: test_precision
value: [0.95833333 0.93103448 0.84       0.92592593 0.9        0.96551724
 1.         0.92592593 0.96551724 0.9       ]

mean value: 0.9312254150702427

key: train_precision
value: [0.96370968 0.96761134 0.9795082  0.96774194 0.96747967 0.96747967
 0.95967742 0.97942387 0.9755102  0.97142857]

mean value: 0.9699570558428222

key: test_recall
value: [0.85185185 1.         0.77777778 0.92592593 1.         1.
 0.92857143 0.89285714 1.         0.96428571]

mean value: 0.9341269841269841

key: train_recall
value: [0.96370968 0.96370968 0.96370968 0.96774194 0.95967742 0.96356275
 0.96356275 0.96356275 0.96761134 0.96356275]

mean value: 0.9640410735274912

key: test_roc_auc
value: [0.90806878 0.96428571 0.81746032 0.92724868 0.94642857 0.98148148
 0.96428571 0.90939153 0.98148148 0.9265873 ]

mean value: 0.9326719576719578

key: train_roc_auc
value: [0.96363622 0.96566051 0.97173338 0.96767664 0.96364438 0.96565234
 0.96162009 0.97170073 0.97170889 0.96766847]

mean value: 0.9670701645553089

key: test_jcc
value: [0.82142857 0.93103448 0.67741935 0.86206897 0.9        0.96551724
 0.92857143 0.83333333 0.96551724 0.87096774]

mean value: 0.8755858361142009

key: train_jcc
value: [0.92996109 0.93359375 0.94466403 0.9375     0.9296875  0.93333333
 0.92607004 0.94444444 0.94466403 0.93700787]

mean value: 0.9360926093439301

MCC on Blind test: 0.23

Accuracy on Blind test: 0.65

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01078415 0.01023746 0.00817394 0.00776267 0.00750327 0.00742102
 0.00754666 0.00760245 0.00785923 0.00745296]

mean value: 0.008234381675720215

key: score_time
value: [0.01067472 0.00926948 0.00836229 0.00813413 0.00799251 0.00794363
 0.00806427 0.0079906  0.00824738 0.00798607]

mean value: 0.008466506004333496

key: test_mcc
value: [0.71588202 0.56841568 0.69419497 0.72546624 0.52715278 0.48393864
 0.61131498 0.79069197 0.75878131 0.53758181]

mean value: 0.6413420391304201

key: train_mcc
value: [0.72945173 0.66755872 0.72778077 0.66639453 0.67908612 0.67326481
 0.68618843 0.67555218 0.64604502 0.69582615]

mean value: 0.6847148445519045

key: test_accuracy
value: [0.85454545 0.78181818 0.83636364 0.85454545 0.76363636 0.72727273
 0.8        0.89090909 0.87272727 0.76363636]

mean value: 0.8145454545454546

key: train_accuracy
value: [0.86464646 0.82626263 0.86060606 0.82626263 0.82828283 0.83030303
 0.83636364 0.83030303 0.81616162 0.84242424]

mean value: 0.8361616161616161

key: test_fscore
value: [0.84       0.76       0.80851064 0.83333333 0.75471698 0.68085106
 0.78431373 0.88461538 0.8627451  0.74509804]

mean value: 0.7954184263953551

key: train_fscore
value: [0.86354379 0.80630631 0.85097192 0.80717489 0.80369515 0.81165919
 0.81797753 0.80995475 0.79458239 0.82666667]

mean value: 0.8192532586237161

key: test_precision
value: [0.91304348 0.82608696 0.95       0.95238095 0.76923077 0.84210526
 0.86956522 0.95833333 0.95652174 0.82608696]

mean value: 0.8863354665929036

key: train_precision
value: [0.87242798 0.91326531 0.91627907 0.90909091 0.94054054 0.90954774
 0.91919192 0.91794872 0.89795918 0.91625616]

mean value: 0.9112507526203477

key: test_recall
value: [0.77777778 0.7037037  0.7037037  0.74074074 0.74074074 0.57142857
 0.71428571 0.82142857 0.78571429 0.67857143]

mean value: 0.7238095238095238

key: train_recall
value: [0.85483871 0.72177419 0.79435484 0.72580645 0.7016129  0.73279352
 0.73684211 0.72469636 0.71255061 0.75303644]

mean value: 0.7458306125114275

key: test_roc_auc
value: [0.8531746  0.78042328 0.83399471 0.85251323 0.76322751 0.73015873
 0.8015873  0.89219577 0.87433862 0.76521164]

mean value: 0.8146825396825397

key: train_roc_auc
value: [0.86466632 0.82647414 0.86074017 0.82646598 0.82853925 0.83010644
 0.83616299 0.83009011 0.81595272 0.84224403]

mean value: 0.8361442144442993

key: test_jcc
value: [0.72413793 0.61290323 0.67857143 0.71428571 0.60606061 0.51612903
 0.64516129 0.79310345 0.75862069 0.59375   ]

mean value: 0.6642723366270363

key: train_jcc
value: [0.75985663 0.6754717  0.7406015  0.67669173 0.67181467 0.68301887
 0.69201521 0.68060837 0.65917603 0.70454545]

mean value: 0.6943800160411975

MCC on Blind test: 0.34

Accuracy on Blind test: 0.78

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00835395 0.00815177 0.00788021 0.00780869 0.00764942 0.00775027
 0.00766587 0.0077424  0.00780225 0.00769472]

mean value: 0.007849955558776855

key: score_time
value: [0.00892568 0.00875401 0.00813055 0.00789833 0.00855637 0.00791192
 0.0080204  0.00795603 0.00823951 0.00798535]

mean value: 0.008237814903259278

key: test_mcc
value: [0.53452248 0.78410665 0.63745526 0.85449735 0.68504815 0.7112589
 0.85695439 0.78353876 0.85695439 0.63841116]

mean value: 0.7342747491731905

key: train_mcc
value: [0.75012681 0.77383014 0.7860094  0.72613214 0.77778141 0.74958366
 0.76193358 0.74958366 0.72166787 0.74199798]

mean value: 0.7538646637900721

key: test_accuracy
value: [0.76363636 0.89090909 0.81818182 0.92727273 0.83636364 0.85454545
 0.92727273 0.89090909 0.92727273 0.81818182]

mean value: 0.8654545454545455

key: train_accuracy
value: [0.87474747 0.88686869 0.89292929 0.86262626 0.88888889 0.87474747
 0.88080808 0.87474747 0.86060606 0.87070707]

mean value: 0.8767676767676768

key: test_fscore
value: [0.73469388 0.89285714 0.80769231 0.92592593 0.84745763 0.85185185
 0.92592593 0.89655172 0.92592593 0.81481481]

mean value: 0.862369712380149

key: train_fscore
value: [0.87242798 0.888      0.89421158 0.85950413 0.88933602 0.87346939
 0.88223553 0.87346939 0.85773196 0.8677686 ]

mean value: 0.8758154566969916

key: test_precision
value: [0.81818182 0.86206897 0.84       0.92592593 0.78125    0.88461538
 0.96153846 0.86666667 0.96153846 0.84615385]

mean value: 0.8747939530137806

key: train_precision
value: [0.8907563  0.88095238 0.88537549 0.88135593 0.8875502  0.88065844
 0.87007874 0.88065844 0.87394958 0.88607595]

mean value: 0.8817411452335624

key: test_recall
value: [0.66666667 0.92592593 0.77777778 0.92592593 0.92592593 0.82142857
 0.89285714 0.92857143 0.89285714 0.78571429]

mean value: 0.8543650793650793

key: train_recall
value: [0.85483871 0.89516129 0.90322581 0.83870968 0.89112903 0.86639676
 0.89473684 0.86639676 0.84210526 0.85020243]

mean value: 0.8702902572809195

key: test_roc_auc
value: [0.76190476 0.89153439 0.81746032 0.92724868 0.83796296 0.85515873
 0.92791005 0.89021164 0.92791005 0.81878307]

mean value: 0.8656084656084656

key: train_roc_auc
value: [0.87478778 0.8868519  0.89290845 0.86267468 0.88888435 0.87473064
 0.88083616 0.87473064 0.86056876 0.87066573]

mean value: 0.8767639088415828

key: test_jcc
value: [0.58064516 0.80645161 0.67741935 0.86206897 0.73529412 0.74193548
 0.86206897 0.8125     0.86206897 0.6875    ]

mean value: 0.7627952627102008

key: train_jcc
value: [0.77372263 0.79856115 0.80866426 0.75362319 0.80072464 0.77536232
 0.78928571 0.77536232 0.75090253 0.76642336]

mean value: 0.7792632101538037

MCC on Blind test: 0.29

Accuracy on Blind test: 0.72

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00789285 0.00730848 0.00801301 0.00789571 0.0079782  0.0079236
 0.00803757 0.00805783 0.00799203 0.00808692]

mean value: 0.007918620109558105

key: score_time
value: [0.01133084 0.0164237  0.01200867 0.01205802 0.01202822 0.01195431
 0.01304603 0.01204181 0.01285148 0.0128901 ]

mean value: 0.012663316726684571

key: test_mcc
value: [0.63745526 0.63745526 0.56441351 0.85449735 0.61131498 0.81854376
 0.81878307 0.86334835 0.89139151 0.63624339]

mean value: 0.7333446438524339

key: train_mcc
value: [0.80310724 0.76975822 0.81041362 0.76604064 0.82627008 0.79394672
 0.7820578  0.77375802 0.77376541 0.78192653]

mean value: 0.7881044273086666

key: test_accuracy
value: [0.81818182 0.81818182 0.78181818 0.92727273 0.8        0.90909091
 0.90909091 0.92727273 0.94545455 0.81818182]

mean value: 0.8654545454545455

key: train_accuracy
value: [0.9010101  0.88484848 0.90505051 0.88282828 0.91313131 0.8969697
 0.89090909 0.88686869 0.88686869 0.89090909]

mean value: 0.8939393939393939

key: test_fscore
value: [0.80769231 0.80769231 0.76923077 0.92592593 0.81355932 0.9122807
 0.90909091 0.93333333 0.94736842 0.82142857]

mean value: 0.864760256923504

key: train_fscore
value: [0.90373281 0.88438134 0.90656064 0.88492063 0.91313131 0.8969697
 0.892      0.88617886 0.88709677 0.89156627]

mean value: 0.8946538330419604

key: test_precision
value: [0.84       0.84       0.8        0.92592593 0.75       0.89655172
 0.92592593 0.875      0.93103448 0.82142857]

mean value: 0.8605866630176975

key: train_precision
value: [0.88122605 0.88979592 0.89411765 0.87109375 0.91497976 0.89516129
 0.88142292 0.88979592 0.88353414 0.88446215]

mean value: 0.8885589547682757

key: test_recall
value: [0.77777778 0.77777778 0.74074074 0.92592593 0.88888889 0.92857143
 0.89285714 1.         0.96428571 0.82142857]

mean value: 0.8718253968253968

key: train_recall
value: [0.92741935 0.87903226 0.91935484 0.89919355 0.91129032 0.89878543
 0.90283401 0.88259109 0.89068826 0.89878543]

mean value: 0.9009974533106961

key: test_roc_auc
value: [0.81746032 0.81746032 0.78108466 0.92724868 0.8015873  0.90873016
 0.90939153 0.92592593 0.94510582 0.81812169]

mean value: 0.8652116402116402

key: train_roc_auc
value: [0.90095664 0.88486026 0.90502155 0.88279515 0.91313504 0.89697336
 0.89093313 0.88686006 0.88687639 0.89092497]

mean value: 0.893933655478647

key: test_jcc
value: [0.67741935 0.67741935 0.625      0.86206897 0.68571429 0.83870968
 0.83333333 0.875      0.9        0.6969697 ]

mean value: 0.7671634668631332

key: train_jcc
value: [0.82437276 0.79272727 0.82909091 0.79359431 0.8401487  0.81318681
 0.80505415 0.79562044 0.79710145 0.80434783]

mean value: 0.8095244624739278

MCC on Blind test: 0.25

Accuracy on Blind test: 0.72

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01772094 0.01750326 0.01661658 0.01747656 0.01727486 0.0177145
 0.01769352 0.01763797 0.01765108 0.01759934]

mean value: 0.017488861083984376

key: score_time
value: [0.01001716 0.00997353 0.0098772  0.00995278 0.01002908 0.01001978
 0.01004839 0.0100143  0.00999403 0.01007152]

mean value: 0.009999775886535644

key: test_mcc
value: [0.56841568 0.78410665 0.63745526 0.89153439 0.78410665 0.81854376
 0.85695439 0.82269299 0.89139151 0.70899471]

mean value: 0.7764195999656158

key: train_mcc
value: [0.80232908 0.78999446 0.80646861 0.77792658 0.78999446 0.78602685
 0.78224023 0.78184638 0.77794469 0.79000817]

mean value: 0.7884779517517934

key: test_accuracy
value: [0.78181818 0.89090909 0.81818182 0.94545455 0.89090909 0.90909091
 0.92727273 0.90909091 0.94545455 0.85454545]

mean value: 0.8872727272727272

key: train_accuracy
value: [0.9010101  0.89494949 0.9030303  0.88888889 0.89494949 0.89292929
 0.89090909 0.89090909 0.88888889 0.89494949]

mean value: 0.8941414141414141

key: test_fscore
value: [0.76       0.89285714 0.80769231 0.94545455 0.89285714 0.9122807
 0.92592593 0.91525424 0.94736842 0.85714286]

mean value: 0.8856833282025075

key: train_fscore
value: [0.90258449 0.896      0.9047619  0.89021956 0.896      0.89378758
 0.89243028 0.89112903 0.88977956 0.89558233]

mean value: 0.895227473341023

key: test_precision
value: [0.82608696 0.86206897 0.84       0.92857143 0.86206897 0.89655172
 0.96153846 0.87096774 0.93103448 0.85714286]

mean value: 0.8836031583641004

key: train_precision
value: [0.89019608 0.88888889 0.890625   0.88142292 0.88888889 0.88492063
 0.87843137 0.8875502  0.88095238 0.88844622]

mean value: 0.8860322585475027

key: test_recall
value: [0.7037037  0.92592593 0.77777778 0.96296296 0.92592593 0.92857143
 0.89285714 0.96428571 0.96428571 0.85714286]

mean value: 0.8903439153439153

key: train_recall
value: [0.91532258 0.90322581 0.91935484 0.89919355 0.90322581 0.90283401
 0.90688259 0.89473684 0.89878543 0.90283401]

mean value: 0.9046395455139088

key: test_roc_auc
value: [0.78042328 0.89153439 0.81746032 0.9457672  0.89153439 0.90873016
 0.92791005 0.90806878 0.94510582 0.85449735]

mean value: 0.8871031746031747

key: train_roc_auc
value: [0.90098113 0.89493274 0.90299726 0.88886803 0.89493274 0.89294926
 0.8909413  0.89091681 0.88890884 0.89496539]

mean value: 0.8941393496147316

key: test_jcc
value: [0.61290323 0.80645161 0.67741935 0.89655172 0.80645161 0.83870968
 0.86206897 0.84375    0.9        0.75      ]

mean value: 0.799430617352614

key: train_jcc
value: [0.82246377 0.8115942  0.82608696 0.80215827 0.8115942  0.80797101
 0.8057554  0.80363636 0.80144404 0.81090909]

mean value: 0.8103613311859039

MCC on Blind test: 0.22

Accuracy on Blind test: 0.71

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.48249555 1.45221186 1.51775455 1.50475144 1.40030074 1.4671185
 1.77495503 1.3907702  1.52468777 1.48590899]

mean value: 1.5000954627990724

key: score_time
value: [0.01191258 0.01389217 0.0141511  0.01366138 0.01346612 0.01382709
 0.01353741 0.01376367 0.01363826 0.01388812]

mean value: 0.013573789596557617

key: test_mcc
value: [0.82269299 0.89153439 0.67602163 0.78353876 0.74603175 0.89153439
 0.89642146 0.82269299 0.86334835 0.7112589 ]

mean value: 0.8105075610922889

key: train_mcc
value: [0.96364438 0.95154681 0.95962779 0.96767664 0.96780409 0.96780199
 0.96364378 0.97575748 0.96770771 0.96770771]

mean value: 0.9652918361453778

key: test_accuracy
value: [0.90909091 0.94545455 0.83636364 0.89090909 0.87272727 0.94545455
 0.94545455 0.90909091 0.92727273 0.85454545]

mean value: 0.9036363636363636

key: train_accuracy
value: [0.98181818 0.97575758 0.97979798 0.98383838 0.98383838 0.98383838
 0.98181818 0.98787879 0.98383838 0.98383838]

mean value: 0.9826262626262626

key: test_fscore
value: [0.90196078 0.94545455 0.82352941 0.88461538 0.87272727 0.94545455
 0.94339623 0.91525424 0.93333333 0.85185185]

mean value: 0.9017577593218595

key: train_fscore
value: [0.98181818 0.9757085  0.97975709 0.98387097 0.98373984 0.98367347
 0.98174442 0.98785425 0.98373984 0.98373984]

mean value: 0.9825646391106369

key: test_precision
value: [0.95833333 0.92857143 0.875      0.92       0.85714286 0.96296296
 1.         0.87096774 0.875      0.88461538]

mean value: 0.9132593708561451

key: train_precision
value: [0.98380567 0.9796748  0.98373984 0.98387097 0.99180328 0.99176955
 0.98373984 0.98785425 0.9877551  0.9877551 ]

mean value: 0.9861768388410251

key: test_recall
value: [0.85185185 0.96296296 0.77777778 0.85185185 0.88888889 0.92857143
 0.89285714 0.96428571 1.         0.82142857]

mean value: 0.8940476190476191

key: train_recall
value: [0.97983871 0.97177419 0.97580645 0.98387097 0.97580645 0.9757085
 0.97975709 0.98785425 0.97975709 0.97975709]

mean value: 0.9789930782290714

key: test_roc_auc
value: [0.90806878 0.9457672  0.83531746 0.89021164 0.87301587 0.9457672
 0.94642857 0.90806878 0.92592593 0.85515873]

mean value: 0.9033730158730159

key: train_roc_auc
value: [0.98182219 0.97576564 0.97980606 0.98383832 0.98385464 0.98382199
 0.98181403 0.98787874 0.98383016 0.98383016]

mean value: 0.9826261917199948

key: test_jcc
value: [0.82142857 0.89655172 0.7        0.79310345 0.77419355 0.89655172
 0.89285714 0.84375    0.875      0.74193548]

mean value: 0.8235371643095503

key: train_jcc
value: [0.96428571 0.95256917 0.96031746 0.96825397 0.968      0.96787149
 0.96414343 0.976      0.968      0.968     ]

mean value: 0.9657441225056214

MCC on Blind test: 0.26

Accuracy on Blind test: 0.65

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01426578 0.01250982 0.01104569 0.00998855 0.01048017 0.01040983
 0.01004791 0.00998306 0.00995708 0.01082873]

mean value: 0.010951662063598632

key: score_time
value: [0.01068878 0.00831199 0.00807023 0.00795078 0.007833   0.00783944
 0.00795031 0.00782537 0.00784492 0.00789261]

mean value: 0.008220744132995606

key: test_mcc
value: [0.86334835 0.89153439 0.85449735 0.74569602 0.71735629 0.92724868
 0.92724868 0.82269299 0.8565805  0.89153439]

mean value: 0.8497737644332478

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.92727273 0.94545455 0.92727273 0.87272727 0.85454545 0.96363636
 0.96363636 0.90909091 0.92727273 0.94545455]

mean value: 0.9236363636363636

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.92       0.94545455 0.92592593 0.86792453 0.86206897 0.96428571
 0.96428571 0.91525424 0.93103448 0.94545455]

mean value: 0.924168865927233

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.92857143 0.92592593 0.88461538 0.80645161 0.96428571
 0.96428571 0.87096774 0.9        0.96296296]

mean value: 0.9208066485485841

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.85185185 0.96296296 0.92592593 0.85185185 0.92592593 0.96428571
 0.96428571 0.96428571 0.96428571 0.92857143]

mean value: 0.9304232804232804

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.92592593 0.9457672  0.92724868 0.8723545  0.85582011 0.96362434
 0.96362434 0.90806878 0.9265873  0.9457672 ]

mean value: 0.9234788359788361

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.85185185 0.89655172 0.86206897 0.76666667 0.75757576 0.93103448
 0.93103448 0.84375    0.87096774 0.89655172]

mean value: 0.8608053397340105

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.36

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10040808 0.10288572 0.1029706  0.10149503 0.09969401 0.10028243
 0.10137939 0.10140562 0.10217285 0.10103154]

mean value: 0.10137252807617188

key: score_time
value: [0.01739192 0.01827312 0.01805353 0.01727128 0.01716781 0.01719832
 0.01750255 0.01722026 0.01744318 0.01742435]

mean value: 0.017494630813598634

key: test_mcc
value: [0.78961518 0.82337971 0.71049701 0.92962225 0.72754449 0.8565805
 0.96428571 0.86334835 0.89602867 0.78174603]

mean value: 0.8342647911704605

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.89090909 0.90909091 0.85454545 0.96363636 0.85454545 0.92727273
 0.98181818 0.92727273 0.94545455 0.89090909]

mean value: 0.9145454545454546

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.88       0.9122807  0.84615385 0.96153846 0.86666667 0.93103448
 0.98181818 0.93333333 0.94915254 0.89285714]

mean value: 0.915483535925352

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.95652174 0.86666667 0.88       1.         0.78787879 0.9
 1.         0.875      0.90322581 0.89285714]

mean value: 0.9062150142984645

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.81481481 0.96296296 0.81481481 0.92592593 0.96296296 0.96428571
 0.96428571 1.         1.         0.89285714]

mean value: 0.9302910052910053

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.88955026 0.91005291 0.85383598 0.96296296 0.85648148 0.9265873
 0.98214286 0.92592593 0.94444444 0.89087302]

mean value: 0.9142857142857143

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.78571429 0.83870968 0.73333333 0.92592593 0.76470588 0.87096774
 0.96428571 0.875      0.90322581 0.80645161]

mean value: 0.8468319980321878

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.32

Accuracy on Blind test: 0.71

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00785303 0.00793576 0.00822353 0.00800967 0.00797462 0.00781012
 0.00777817 0.00785923 0.00833726 0.0080986 ]

mean value: 0.00798799991607666

key: score_time
value: [0.00809956 0.00805473 0.00806642 0.00806236 0.00844526 0.00803256
 0.00801802 0.00814533 0.0084455  0.00861764]

mean value: 0.008198738098144531

key: test_mcc
value: [0.67602163 0.86402765 0.49468252 0.67284827 0.79069197 0.89139151
 0.92724868 0.81854376 0.81854376 0.34721618]

mean value: 0.7301215940014808

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.83636364 0.92727273 0.74545455 0.83636364 0.89090909 0.94545455
 0.96363636 0.90909091 0.90909091 0.67272727]

mean value: 0.8636363636363636

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.82352941 0.93103448 0.72       0.83018868 0.89655172 0.94736842
 0.96428571 0.9122807  0.9122807  0.7       ]

mean value: 0.8637519836753659

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.875      0.87096774 0.7826087  0.84615385 0.83870968 0.93103448
 0.96428571 0.89655172 0.89655172 0.65625   ]

mean value: 0.8558113606481056

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.77777778 1.         0.66666667 0.81481481 0.96296296 0.96428571
 0.96428571 0.92857143 0.92857143 0.75      ]

mean value: 0.8757936507936508

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.83531746 0.92857143 0.74404762 0.83597884 0.89219577 0.94510582
 0.96362434 0.90873016 0.90873016 0.6712963 ]

mean value: 0.8633597883597883

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.7        0.87096774 0.5625     0.70967742 0.8125     0.9
 0.93103448 0.83870968 0.83870968 0.53846154]

mean value: 0.7702560537349191

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.19

Accuracy on Blind test: 0.65

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.31766438 1.37345815 1.27461958 1.28742099 1.28492332 1.28699183
 1.29451942 1.28101182 1.29123402 1.28577328]

mean value: 1.2977616786956787

key: score_time
value: [0.09975529 0.15991688 0.09050679 0.09112549 0.0906496  0.09106612
 0.09059644 0.09088278 0.09059429 0.09115982]

mean value: 0.09862534999847412

key: test_mcc
value: [0.89602867 0.92980214 0.82269299 0.89139151 0.92980214 0.96423926
 0.96428571 0.89602867 0.96423926 0.92724868]

mean value: 0.9185759034850024

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94545455 0.96363636 0.90909091 0.94545455 0.96363636 0.98181818
 0.98181818 0.94545455 0.98181818 0.96363636]

mean value: 0.9581818181818181

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.96428571 0.90196078 0.94339623 0.96428571 0.98245614
 0.98181818 0.94915254 0.98245614 0.96428571]

mean value: 0.9575273629067016

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.93103448 0.95833333 0.96153846 0.93103448 0.96551724
 1.         0.90322581 0.96551724 0.96428571]

mean value: 0.9580486763884984

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.88888889 1.         0.85185185 0.92592593 1.         1.
 0.96428571 1.         1.         0.96428571]

mean value: 0.9595238095238096

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94444444 0.96428571 0.90806878 0.94510582 0.96428571 0.98148148
 0.98214286 0.94444444 0.98148148 0.96362434]

mean value: 0.957936507936508

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.93103448 0.82142857 0.89285714 0.93103448 0.96551724
 0.96428571 0.90322581 0.96551724 0.93103448]

mean value: 0.9194824054946413

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.2

Accuracy on Blind test: 0.51

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.91016841 0.90267611 0.96385503 0.89476395 0.87512565 0.92300725
 0.92452002 0.88380647 0.98288059 0.92339325]

mean value: 0.9184196710586547

key: score_time
value: [0.24338746 0.25723505 0.2364409  0.19419646 0.26264167 0.20185113
 0.21379876 0.27131295 0.2565093  0.25926137]

mean value: 0.2396635055541992

key: test_mcc
value: [0.89602867 0.92980214 0.82269299 0.89139151 0.92980214 0.92724868
 0.96428571 0.89602867 0.96423926 0.92724868]

mean value: 0.914876845571549

key: train_mcc
value: [0.94766581 0.95154523 0.95574863 0.94766581 0.95163767 0.94754543
 0.94371421 0.95556354 0.9395879  0.94767006]

mean value: 0.9488344285838511

key: test_accuracy
value: [0.94545455 0.96363636 0.90909091 0.94545455 0.96363636 0.96363636
 0.98181818 0.94545455 0.98181818 0.96363636]

mean value: 0.9563636363636363

key: train_accuracy
value: [0.97373737 0.97575758 0.97777778 0.97373737 0.97575758 0.97373737
 0.97171717 0.97777778 0.96969697 0.97373737]

mean value: 0.9743434343434344

key: test_fscore
value: [0.94117647 0.96428571 0.90196078 0.94339623 0.96428571 0.96428571
 0.98181818 0.94915254 0.98245614 0.96428571]

mean value: 0.9557103203001853

key: train_fscore
value: [0.9740519  0.97590361 0.97804391 0.9740519  0.976      0.97384306
 0.972      0.97777778 0.96993988 0.9739479 ]

mean value: 0.974555993072763

key: test_precision
value: [1.         0.93103448 0.95833333 0.96153846 0.93103448 0.96428571
 1.         0.90322581 0.96551724 0.96428571]

mean value: 0.9579255236791389

key: train_precision
value: [0.96442688 0.972      0.96837945 0.96442688 0.96825397 0.968
 0.96047431 0.97580645 0.96031746 0.96428571]

mean value: 0.9666371104351469

key: test_recall
value: [0.88888889 1.         0.85185185 0.92592593 1.         0.96428571
 0.96428571 1.         1.         0.96428571]

mean value: 0.955952380952381

key: train_recall
value: [0.98387097 0.97983871 0.98790323 0.98387097 0.98387097 0.97975709
 0.98380567 0.97975709 0.97975709 0.98380567]

mean value: 0.9826237429802795

key: test_roc_auc
value: [0.94444444 0.96428571 0.90806878 0.94510582 0.96428571 0.96362434
 0.98214286 0.94444444 0.98148148 0.96362434]

mean value: 0.9561507936507937

key: train_roc_auc
value: [0.97371686 0.97574931 0.97775728 0.97371686 0.97574115 0.97374951
 0.97174154 0.97778177 0.96971725 0.97375767]

mean value: 0.9743429215097297

key: test_jcc
value: [0.88888889 0.93103448 0.82142857 0.89285714 0.93103448 0.93103448
 0.96428571 0.90322581 0.96551724 0.93103448]

mean value: 0.9160341296325724

key: train_jcc
value: [0.94941634 0.95294118 0.95703125 0.94941634 0.953125   0.94901961
 0.94552529 0.95652174 0.94163424 0.94921875]

mean value: 0.9503849741342992

MCC on Blind test: 0.2

Accuracy on Blind test: 0.52

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01822448 0.00774431 0.00775194 0.00768995 0.0077107  0.00761104
 0.00783753 0.00774765 0.00785279 0.00792027]

mean value: 0.008809065818786621

key: score_time
value: [0.01004004 0.00803638 0.00816011 0.00804186 0.00801206 0.00807762
 0.00851011 0.00811744 0.0079689  0.00807619]

mean value: 0.008304071426391602

key: test_mcc
value: [0.53452248 0.78410665 0.63745526 0.85449735 0.68504815 0.7112589
 0.85695439 0.78353876 0.85695439 0.63841116]

mean value: 0.7342747491731905

key: train_mcc
value: [0.75012681 0.77383014 0.7860094  0.72613214 0.77778141 0.74958366
 0.76193358 0.74958366 0.72166787 0.74199798]

mean value: 0.7538646637900721

key: test_accuracy
value: [0.76363636 0.89090909 0.81818182 0.92727273 0.83636364 0.85454545
 0.92727273 0.89090909 0.92727273 0.81818182]

mean value: 0.8654545454545455

key: train_accuracy
value: [0.87474747 0.88686869 0.89292929 0.86262626 0.88888889 0.87474747
 0.88080808 0.87474747 0.86060606 0.87070707]

mean value: 0.8767676767676768

key: test_fscore
value: [0.73469388 0.89285714 0.80769231 0.92592593 0.84745763 0.85185185
 0.92592593 0.89655172 0.92592593 0.81481481]

mean value: 0.862369712380149

key: train_fscore
value: [0.87242798 0.888      0.89421158 0.85950413 0.88933602 0.87346939
 0.88223553 0.87346939 0.85773196 0.8677686 ]

mean value: 0.8758154566969916

key: test_precision
value: [0.81818182 0.86206897 0.84       0.92592593 0.78125    0.88461538
 0.96153846 0.86666667 0.96153846 0.84615385]

mean value: 0.8747939530137806

key: train_precision
value: [0.8907563  0.88095238 0.88537549 0.88135593 0.8875502  0.88065844
 0.87007874 0.88065844 0.87394958 0.88607595]

mean value: 0.8817411452335624

key: test_recall
value: [0.66666667 0.92592593 0.77777778 0.92592593 0.92592593 0.82142857
 0.89285714 0.92857143 0.89285714 0.78571429]

mean value: 0.8543650793650793

key: train_recall
value: [0.85483871 0.89516129 0.90322581 0.83870968 0.89112903 0.86639676
 0.89473684 0.86639676 0.84210526 0.85020243]

mean value: 0.8702902572809195

key: test_roc_auc
value: [0.76190476 0.89153439 0.81746032 0.92724868 0.83796296 0.85515873
 0.92791005 0.89021164 0.92791005 0.81878307]

mean value: 0.8656084656084656

key: train_roc_auc
value: [0.87478778 0.8868519  0.89290845 0.86267468 0.88888435 0.87473064
 0.88083616 0.87473064 0.86056876 0.87066573]

mean value: 0.8767639088415828

key: test_jcc
value: [0.58064516 0.80645161 0.67741935 0.86206897 0.73529412 0.74193548
 0.86206897 0.8125     0.86206897 0.6875    ]

mean value: 0.7627952627102008

key: train_jcc
value: [0.77372263 0.79856115 0.80866426 0.75362319 0.80072464 0.77536232
 0.78928571 0.77536232 0.75090253 0.76642336]

mean value: 0.7792632101538037

MCC on Blind test: 0.29

Accuracy on Blind test: 0.72

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.24260426 0.04918981 0.05251241 0.05346513 0.05343127 0.05336189
 0.05402541 0.05282021 0.05996943 0.0555234 ]

mean value: 0.07269032001495361

key: score_time
value: [0.01044059 0.0108521  0.01054335 0.01029539 0.0097692  0.01019621
 0.01065063 0.01008081 0.00973535 0.01028442]

mean value: 0.010284805297851562

key: test_mcc
value: [0.89602867 0.96428571 0.89139151 0.89139151 0.92724868 0.96423926
 0.96423926 0.89602867 0.92962225 0.92724868]

mean value: 0.9251724200737174

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94545455 0.98181818 0.94545455 0.94545455 0.96363636 0.98181818
 0.98181818 0.94545455 0.96363636 0.96363636]

mean value: 0.9618181818181818

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.98181818 0.94339623 0.94339623 0.96296296 0.98245614
 0.98245614 0.94915254 0.96551724 0.96428571]

mean value: 0.9616617846939229

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96428571 0.96153846 0.96153846 0.96296296 0.96551724
 0.96551724 0.90322581 0.93333333 0.96428571]

mean value: 0.9582204937154881

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.88888889 1.         0.92592593 0.92592593 0.96296296 1.
 1.         1.         1.         0.96428571]

mean value: 0.9667989417989418

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94444444 0.98214286 0.94510582 0.94510582 0.96362434 0.98148148
 0.98148148 0.94444444 0.96296296 0.96362434]

mean value: 0.9614417989417989

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.96428571 0.89285714 0.89285714 0.92857143 0.96551724
 0.96551724 0.90322581 0.93333333 0.93103448]

mean value: 0.9266088422762505

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.38

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01531887 0.04107666 0.04202557 0.04105949 0.04171562 0.01796556
 0.01756859 0.0427742  0.04265285 0.01785755]

mean value: 0.032001495361328125

key: score_time
value: [0.01036215 0.02154422 0.01961827 0.02058625 0.02116394 0.01096559
 0.01085567 0.02103901 0.02116776 0.01119232]

mean value: 0.016849517822265625

key: test_mcc
value: [0.67284827 0.78410665 0.64214885 0.89153439 0.78410665 0.89139151
 0.89153439 0.8565805  0.92724868 0.78353876]

mean value: 0.8125038646516862

key: train_mcc
value: [0.8435716  0.85067196 0.87981045 0.85892085 0.85478898 0.86365469
 0.85916382 0.86702055 0.85107823 0.84299263]

mean value: 0.8571673756575152

key: test_accuracy
value: [0.83636364 0.89090909 0.81818182 0.94545455 0.89090909 0.94545455
 0.94545455 0.92727273 0.96363636 0.89090909]

mean value: 0.9054545454545454

key: train_accuracy
value: [0.92121212 0.92525253 0.93939394 0.92929293 0.92727273 0.93131313
 0.92929293 0.93333333 0.92525253 0.92121212]

mean value: 0.9282828282828283

key: test_fscore
value: [0.83018868 0.89285714 0.8        0.94545455 0.89285714 0.94736842
 0.94545455 0.93103448 0.96428571 0.89655172]

mean value: 0.9046052398103557

key: train_fscore
value: [0.92337917 0.9261477  0.94094488 0.9304175  0.92828685 0.93280632
 0.9304175  0.93413174 0.92644135 0.92246521]

mean value: 0.9295438225256318

key: test_precision
value: [0.84615385 0.86206897 0.86956522 0.92857143 0.86206897 0.93103448
 0.96296296 0.9        0.96428571 0.86666667]

mean value: 0.8993378249825026

key: train_precision
value: [0.90038314 0.91699605 0.91923077 0.91764706 0.91732283 0.91119691
 0.9140625  0.92125984 0.91015625 0.90625   ]

mean value: 0.9134505355609847

key: test_recall
value: [0.81481481 0.92592593 0.74074074 0.96296296 0.92592593 0.96428571
 0.92857143 0.96428571 0.96428571 0.92857143]

mean value: 0.912037037037037

key: train_recall
value: [0.94758065 0.93548387 0.96370968 0.94354839 0.93951613 0.95546559
 0.94736842 0.94736842 0.94331984 0.93927126]

mean value: 0.9462632231944625

key: test_roc_auc
value: [0.83597884 0.89153439 0.81679894 0.9457672  0.89153439 0.94510582
 0.9457672  0.9265873  0.96362434 0.89021164]

mean value: 0.9052910052910054

key: train_roc_auc
value: [0.92115874 0.92523181 0.93934472 0.92926407 0.92724794 0.93136183
 0.92932937 0.93336163 0.92528895 0.92124853]

mean value: 0.9282837599582081

key: test_jcc
value: [0.70967742 0.80645161 0.66666667 0.89655172 0.80645161 0.9
 0.89655172 0.87096774 0.93103448 0.8125    ]

mean value: 0.8296852984797923

key: train_jcc
value: [0.85766423 0.86245353 0.88847584 0.86988848 0.866171   0.87407407
 0.86988848 0.87640449 0.86296296 0.85608856]

mean value: 0.8684071649301385

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01084638 0.00925279 0.00854015 0.0082531  0.0082016  0.00828433
 0.00811386 0.0079782  0.00824904 0.00820422]

mean value: 0.008592367172241211

key: score_time
value: [0.01097393 0.00891614 0.00880241 0.00852537 0.00860667 0.0081892
 0.00847697 0.00831318 0.00843    0.00821924]

mean value: 0.008745312690734863

key: test_mcc
value: [0.60876172 0.78410665 0.63745526 0.89153439 0.71735629 0.81854376
 0.85695439 0.82269299 0.89139151 0.70899471]

mean value: 0.7737791675722022

key: train_mcc
value: [0.79012008 0.77375802 0.79409222 0.76162335 0.77778141 0.77376541
 0.76589215 0.76970043 0.76565561 0.78592069]

mean value: 0.7758309364223852

key: test_accuracy
value: [0.8        0.89090909 0.81818182 0.94545455 0.85454545 0.90909091
 0.92727273 0.90909091 0.94545455 0.85454545]

mean value: 0.8854545454545455

key: train_accuracy
value: [0.89494949 0.88686869 0.8969697  0.88080808 0.88888889 0.88686869
 0.88282828 0.88484848 0.88282828 0.89292929]

mean value: 0.8878787878787879

key: test_fscore
value: [0.7755102  0.89285714 0.80769231 0.94545455 0.86206897 0.9122807
 0.92592593 0.91525424 0.94736842 0.85714286]

mean value: 0.8841555308766806

key: train_fscore
value: [0.89641434 0.8875502  0.89820359 0.88080808 0.88933602 0.88709677
 0.884      0.88438134 0.88259109 0.89336016]

mean value: 0.8883741600170872

key: test_precision
value: [0.86363636 0.86206897 0.84       0.92857143 0.80645161 0.89655172
 0.96153846 0.87096774 0.93103448 0.85714286]

mean value: 0.8817963638141614

key: train_precision
value: [0.88582677 0.884      0.88932806 0.88259109 0.8875502  0.88353414
 0.87351779 0.88617886 0.88259109 0.888     ]

mean value: 0.8843118006828748

key: test_recall
value: [0.7037037  0.92592593 0.77777778 0.96296296 0.92592593 0.92857143
 0.89285714 0.96428571 0.96428571 0.85714286]

mean value: 0.8903439153439153

key: train_recall
value: [0.90725806 0.89112903 0.90725806 0.87903226 0.89112903 0.89068826
 0.89473684 0.88259109 0.88259109 0.89878543]

mean value: 0.8925199164163511

key: test_roc_auc
value: [0.79828042 0.89153439 0.81746032 0.9457672  0.85582011 0.90873016
 0.92791005 0.90806878 0.94510582 0.85449735]

mean value: 0.8853174603174603

key: train_roc_auc
value: [0.89492458 0.88686006 0.89694887 0.88081168 0.88888435 0.88687639
 0.88285229 0.88484393 0.8828278  0.8929411 ]

mean value: 0.8878771059161551

key: test_jcc
value: [0.63333333 0.80645161 0.67741935 0.89655172 0.75757576 0.83870968
 0.86206897 0.84375    0.9        0.75      ]

mean value: 0.7965860425725554

key: train_jcc
value: [0.81227437 0.79783394 0.81521739 0.78700361 0.80072464 0.79710145
 0.7921147  0.79272727 0.78985507 0.80727273]

mean value: 0.7992125159422541

MCC on Blind test: 0.29

Accuracy on Blind test: 0.71

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0109973  0.01174498 0.01200843 0.01177812 0.01327801 0.01348186
 0.01264    0.01432228 0.02716994 0.01318049]

mean value: 0.014060139656066895

key: score_time
value: [0.00866151 0.01013803 0.01016569 0.01051307 0.01054215 0.01067781
 0.01127958 0.01156712 0.02180862 0.01062226]

mean value: 0.011597585678100587

key: test_mcc
value: [0.75724019 0.78353876 0.67602163 0.75878131 0.71588202 0.89153439
 0.96428571 0.89602867 0.92724868 0.92980214]

mean value: 0.830036349769513

key: train_mcc
value: [0.90767739 0.75673387 0.89988762 0.82550688 0.81837405 0.89599275
 0.84921709 0.857966   0.89212884 0.86668482]

mean value: 0.857016930626917

key: test_accuracy
value: [0.87272727 0.89090909 0.83636364 0.87272727 0.85454545 0.94545455
 0.98181818 0.94545455 0.96363636 0.96363636]

mean value: 0.9127272727272727

key: train_accuracy
value: [0.95353535 0.87474747 0.94949495 0.90909091 0.90505051 0.94747475
 0.92323232 0.92727273 0.94545455 0.93131313]

mean value: 0.9266666666666666

key: test_fscore
value: [0.85714286 0.88461538 0.82352941 0.88135593 0.84       0.94545455
 0.98181818 0.94915254 0.96428571 0.96296296]

mean value: 0.9090317532620623

key: train_fscore
value: [0.95277207 0.86580087 0.94845361 0.91493384 0.89804772 0.94605809
 0.91983122 0.93023256 0.94386694 0.92765957]

mean value: 0.9247656499131667

key: test_precision
value: [0.95454545 0.92       0.875      0.8125     0.91304348 0.96296296
 1.         0.90322581 0.96428571 1.        ]

mean value: 0.9305563416506615

key: train_precision
value: [0.9707113  0.93457944 0.97046414 0.86120996 0.97183099 0.97021277
 0.96035242 0.89219331 0.97008547 0.97757848]

mean value: 0.9479218264509782

key: test_recall
value: [0.77777778 0.85185185 0.77777778 0.96296296 0.77777778 0.92857143
 0.96428571 1.         0.96428571 0.92857143]

mean value: 0.8933862433862434

key: train_recall
value: [0.93548387 0.80645161 0.92741935 0.97580645 0.83467742 0.92307692
 0.88259109 0.97165992 0.91902834 0.88259109]

mean value: 0.9058786078098472

key: test_roc_auc
value: [0.87103175 0.89021164 0.83531746 0.87433862 0.8531746  0.9457672
 0.98214286 0.94444444 0.96362434 0.96428571]

mean value: 0.9124338624338624

key: train_roc_auc
value: [0.95357189 0.87488573 0.94953964 0.90895586 0.90519296 0.94742556
 0.92315039 0.92736222 0.94540127 0.9312149 ]

mean value: 0.92667004048583

key: test_jcc
value: [0.75       0.79310345 0.7        0.78787879 0.72413793 0.89655172
 0.96428571 0.90322581 0.93103448 0.92857143]

mean value: 0.837878932339444

key: train_jcc
value: [0.90980392 0.76335878 0.90196078 0.84320557 0.81496063 0.8976378
 0.8515625  0.86956522 0.89370079 0.86507937]

mean value: 0.8610835354490294

MCC on Blind test: 0.23

Accuracy on Blind test: 0.64

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0145483  0.01222491 0.01315165 0.01257706 0.01379681 0.01266575
 0.0136075  0.01250958 0.014539   0.01288724]

mean value: 0.013250780105590821

key: score_time
value: [0.01066113 0.01051998 0.01067948 0.01066899 0.01064396 0.01054311
 0.01067638 0.01064086 0.01070881 0.01063132]

mean value: 0.010637402534484863

key: test_mcc
value: [0.81854376 0.82269299 0.71049701 0.83147942 0.74935731 0.83147942
 0.8565805  0.80032673 0.85695439 0.83251448]

mean value: 0.8110426022713871

key: train_mcc
value: [0.89986978 0.78532023 0.91930903 0.90350829 0.79835384 0.80158821
 0.82922447 0.76325368 0.8535924  0.78649322]

mean value: 0.8340513160102464

key: test_accuracy
value: [0.90909091 0.90909091 0.85454545 0.90909091 0.87272727 0.90909091
 0.92727273 0.89090909 0.92727273 0.90909091]

mean value: 0.9018181818181817

key: train_accuracy
value: [0.94949495 0.88484848 0.95959596 0.95151515 0.89090909 0.89494949
 0.91111111 0.87272727 0.92323232 0.88484848]

mean value: 0.9123232323232323

key: test_fscore
value: [0.90566038 0.90196078 0.84615385 0.89795918 0.8627451  0.91803279
 0.93103448 0.90322581 0.92592593 0.90196078]

mean value: 0.8994659075873878

key: train_fscore
value: [0.95069034 0.87248322 0.96       0.95081967 0.87892377 0.90298507
 0.91634981 0.88482633 0.91774892 0.87133183]

mean value: 0.9106158951845009

key: test_precision
value: [0.92307692 0.95833333 0.88       1.         0.91666667 0.84848485
 0.9        0.82352941 0.96153846 1.        ]

mean value: 0.9211629644864939

key: train_precision
value: [0.93050193 0.9798995  0.95238095 0.96666667 0.98989899 0.83737024
 0.86379928 0.80666667 0.98604651 0.98469388]

mean value: 0.9297924618150225

key: test_recall
value: [0.88888889 0.85185185 0.81481481 0.81481481 0.81481481 1.
 0.96428571 1.         0.89285714 0.82142857]

mean value: 0.8863756613756614

key: train_recall
value: [0.97177419 0.78629032 0.96774194 0.93548387 0.79032258 0.97975709
 0.9757085  0.97975709 0.8582996  0.78137652]

mean value: 0.9026511688650908

key: test_roc_auc
value: [0.90873016 0.90806878 0.85383598 0.90740741 0.87169312 0.90740741
 0.9265873  0.88888889 0.92791005 0.91071429]

mean value: 0.9011243386243386

key: train_roc_auc
value: [0.94944985 0.885048   0.95957947 0.9515476  0.89111271 0.89512048
 0.91124135 0.87294306 0.92310141 0.88463987]

mean value: 0.9123783792608071

key: test_jcc
value: [0.82758621 0.82142857 0.73333333 0.81481481 0.75862069 0.84848485
 0.87096774 0.82352941 0.86206897 0.82142857]

mean value: 0.8182263155259295

key: train_jcc
value: [0.90601504 0.77380952 0.92307692 0.90625    0.784      0.82312925
 0.84561404 0.79344262 0.848      0.772     ]

mean value: 0.8375337394219651

MCC on Blind test: 0.23

Accuracy on Blind test: 0.65

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10882998 0.09430504 0.09354877 0.09708071 0.09502983 0.09874582
 0.09948301 0.10101175 0.10146403 0.1005578 ]

mean value: 0.09900567531585694

key: score_time
value: [0.01448226 0.01464081 0.01445723 0.01509094 0.01563764 0.01575255
 0.01563501 0.01564932 0.01564193 0.01550007]

mean value: 0.015248775482177734

key: test_mcc
value: [0.89602867 0.92980214 0.74569602 0.89139151 0.96428571 0.96423926
 0.92724868 0.89602867 0.96423926 0.92724868]

mean value: 0.9106208597080531

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.94545455 0.96363636 0.87272727 0.94545455 0.98181818 0.98181818
 0.96363636 0.94545455 0.98181818 0.96363636]

mean value: 0.9545454545454546

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94117647 0.96428571 0.86792453 0.94339623 0.98181818 0.98245614
 0.96428571 0.94915254 0.98245614 0.96428571]

mean value: 0.9541237373055177

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.93103448 0.88461538 0.96153846 0.96428571 0.96551724
 0.96428571 0.90322581 0.96551724 0.96428571]

mean value: 0.9504305760979843

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.88888889 1.         0.85185185 0.92592593 1.         1.
 0.96428571 1.         1.         0.96428571]

mean value: 0.9595238095238096

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.94444444 0.96428571 0.8723545  0.94510582 0.98214286 0.98148148
 0.96362434 0.94444444 0.98148148 0.96362434]

mean value: 0.9542989417989418

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.88888889 0.93103448 0.76666667 0.89285714 0.96428571 0.96551724
 0.93103448 0.90322581 0.96551724 0.93103448]

mean value: 0.9140062150184508

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.13

Accuracy on Blind test: 0.4

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.04016852 0.0324285  0.02866578 0.0298512  0.03131413 0.04710269
 0.03221202 0.03381968 0.03193164 0.03873563]

mean value: 0.03462297916412353

key: score_time
value: [0.02359104 0.02912378 0.0166347  0.03185606 0.01757717 0.01780367
 0.02005625 0.01775956 0.02499628 0.03069973]

mean value: 0.023009824752807616

key: test_mcc
value: [0.96423926 0.92980214 0.8565805  0.85449735 0.92980214 0.96423926
 1.         0.89602867 0.96423926 0.89153439]

mean value: 0.9250962972081643

key: train_mcc
value: [0.99195168 0.9838707  0.97980606 0.9878869  0.98383832 0.97980573
 0.97980606 0.99596768 0.97172522 0.98795103]

mean value: 0.9842609370911118

key: test_accuracy
value: [0.98181818 0.96363636 0.92727273 0.92727273 0.96363636 0.98181818
 1.         0.94545455 0.98181818 0.94545455]

mean value: 0.9618181818181818

key: train_accuracy
value: [0.9959596  0.99191919 0.98989899 0.99393939 0.99191919 0.98989899
 0.98989899 0.9979798  0.98585859 0.99393939]

mean value: 0.9921212121212121

key: test_fscore
value: [0.98113208 0.96428571 0.92307692 0.92592593 0.96428571 0.98245614
 1.         0.94915254 0.98245614 0.94545455]

mean value: 0.9618225721575157

key: train_fscore
value: [0.99595142 0.99190283 0.98989899 0.99393939 0.99193548 0.98985801
 0.98989899 0.9979716  0.98585859 0.99389002]

mean value: 0.9921105329450134

key: test_precision
value: [1.         0.93103448 0.96       0.92592593 0.93103448 0.96551724
 1.         0.90322581 0.96551724 0.96296296]

mean value: 0.9545218143616364

key: train_precision
value: [1.         0.99593496 0.99190283 0.99595142 0.99193548 0.99186992
 0.98790323 1.         0.98387097 1.        ]

mean value: 0.9939368806480281

key: test_recall
value: [0.96296296 1.         0.88888889 0.92592593 1.         1.
 1.         1.         1.         0.92857143]

mean value: 0.9706349206349206

key: train_recall
value: [0.99193548 0.98790323 0.98790323 0.99193548 0.99193548 0.98785425
 0.99190283 0.99595142 0.98785425 0.98785425]

mean value: 0.990302990727439

key: test_roc_auc
value: [0.98148148 0.96428571 0.9265873  0.92724868 0.96428571 0.98148148
 1.         0.94444444 0.98148148 0.9457672 ]

mean value: 0.9617063492063492

key: train_roc_auc
value: [0.99596774 0.99192732 0.98990303 0.99394345 0.99191916 0.98989487
 0.98990303 0.99797571 0.98586261 0.99392713]

mean value: 0.9921224043359018

key: test_jcc
value: [0.96296296 0.93103448 0.85714286 0.86206897 0.93103448 0.96551724
 1.         0.90322581 0.96551724 0.89655172]

mean value: 0.9275055764488467

key: train_jcc
value: [0.99193548 0.98393574 0.98       0.98795181 0.984      0.97991968
 0.98       0.99595142 0.97211155 0.98785425]

mean value: 0.9843659934587685

MCC on Blind test: 0.1

Accuracy on Blind test: 0.34

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.09521818 0.14583254 0.16052413 0.17061996 0.16926599 0.16544843
 0.14057255 0.18005562 0.18195176 0.14595151]

mean value: 0.15554406642913818

key: score_time
value: [0.01243162 0.02033305 0.01982188 0.02640724 0.02747393 0.01361799
 0.02945495 0.02985644 0.0272851  0.0134213 ]

mean value: 0.02201035022735596

key: test_mcc
value: [0.64214885 0.78410665 0.60000053 0.89139151 0.68504815 0.8565805
 0.85695439 0.86334835 0.89139151 0.74569602]

mean value: 0.781666645397211

key: train_mcc
value: [0.86751154 0.84656958 0.86751154 0.85498218 0.85478898 0.83883199
 0.84716822 0.85466123 0.85085332 0.85500107]

mean value: 0.8537879647323043

key: test_accuracy
value: [0.81818182 0.89090909 0.8        0.94545455 0.83636364 0.92727273
 0.92727273 0.92727273 0.94545455 0.87272727]

mean value: 0.889090909090909

key: train_accuracy
value: [0.93333333 0.92323232 0.93333333 0.92727273 0.92727273 0.91919192
 0.92323232 0.92727273 0.92525253 0.92727273]

mean value: 0.9266666666666666

key: test_fscore
value: [0.8        0.89285714 0.79245283 0.94339623 0.84745763 0.93103448
 0.92592593 0.93333333 0.94736842 0.87719298]

mean value: 0.8891018972106213

key: train_fscore
value: [0.93491124 0.924      0.93491124 0.92857143 0.92828685 0.92031873
 0.92460317 0.92771084 0.9261477  0.92828685]

mean value: 0.92777480666249

key: test_precision
value: [0.86956522 0.86206897 0.80769231 0.96153846 0.78125    0.9
 0.96153846 0.875      0.93103448 0.86206897]

mean value: 0.8811756861953639

key: train_precision
value: [0.91505792 0.91666667 0.91505792 0.9140625  0.91732283 0.90588235
 0.90661479 0.92031873 0.91338583 0.91372549]

mean value: 0.9138095012428894

key: test_recall
value: [0.74074074 0.92592593 0.77777778 0.92592593 0.92592593 0.96428571
 0.89285714 1.         0.96428571 0.89285714]

mean value: 0.9010582010582011

key: train_recall
value: [0.95564516 0.93145161 0.95564516 0.94354839 0.93951613 0.93522267
 0.94331984 0.93522267 0.93927126 0.94331984]

mean value: 0.9422162726916548

key: test_roc_auc
value: [0.81679894 0.89153439 0.79960317 0.94510582 0.83796296 0.9265873
 0.92791005 0.92592593 0.94510582 0.8723545 ]

mean value: 0.8888888888888888

key: train_roc_auc
value: [0.93328817 0.92321568 0.93328817 0.92723978 0.92724794 0.91922424
 0.92327282 0.92728876 0.92528079 0.92730508]

mean value: 0.9266651430063993

key: test_jcc
value: [0.66666667 0.80645161 0.65625    0.89285714 0.73529412 0.87096774
 0.86206897 0.875      0.9        0.78125   ]

mean value: 0.804680624752682

key: train_jcc
value: [0.87777778 0.85873606 0.87777778 0.86666667 0.866171   0.85239852
 0.8597786  0.86516854 0.86245353 0.866171  ]

mean value: 0.8653099481832294

MCC on Blind test: 0.29

Accuracy on Blind test: 0.72

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.25400209 0.25350809 0.25353813 0.25287437 0.24922609 0.24631763
 0.25787902 0.25717902 0.25663543 0.25559354]

mean value: 0.25367534160614014

key: score_time
value: [0.00894332 0.00885177 0.00907326 0.00875664 0.00890183 0.00927663
 0.00877619 0.00920916 0.00972295 0.00899959]

mean value: 0.009051132202148437

key: test_mcc
value: [0.92962225 0.96428571 0.89139151 0.89139151 0.89153439 0.96423926
 0.96423926 0.89602867 0.96423926 0.92724868]

mean value: 0.9284220498029769

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96363636 0.98181818 0.94545455 0.94545455 0.94545455 0.98181818
 0.98181818 0.94545455 0.98181818 0.96363636]

mean value: 0.9636363636363636

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96153846 0.98181818 0.94339623 0.94339623 0.94545455 0.98245614
 0.98245614 0.94915254 0.98245614 0.96428571]

mean value: 0.9636410319352604

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96428571 0.96153846 0.96153846 0.92857143 0.96551724
 0.96551724 0.90322581 0.96551724 0.96428571]

mean value: 0.9579997310809324

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.92592593 1.         0.92592593 0.92592593 0.96296296 1.
 1.         1.         1.         0.96428571]

mean value: 0.9705026455026455

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96296296 0.98214286 0.94510582 0.94510582 0.9457672  0.98148148
 0.98148148 0.94444444 0.98148148 0.96362434]

mean value: 0.9633597883597884

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.92592593 0.96428571 0.89285714 0.89285714 0.89655172 0.96551724
 0.96551724 0.90322581 0.96551724 0.93103448]

mean value: 0.9303289663412022

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.3

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01195788 0.0138123  0.01438642 0.01398134 0.014189   0.01624131
 0.01444888 0.0144527  0.01394033 0.0147655 ]

mean value: 0.014217567443847657

key: score_time
value: [0.01118398 0.01099157 0.01105928 0.01112914 0.01111221 0.01204634
 0.01111531 0.01175618 0.01196051 0.01118875]

mean value: 0.011354327201843262

key: test_mcc
value: [0.35634832 0.71735629 0.65060574 0.68300095 0.6005291  0.47230166
 0.70899471 0.71735629 0.69688314 0.49734925]

mean value: 0.6100725452940903

key: train_mcc
value: [0.68737636 0.74945491 0.78935739 0.77491061 0.8187082  0.79126011
 0.79359843 0.78561297 0.78312126 0.78838114]

mean value: 0.7761781373175624

key: test_accuracy
value: [0.65454545 0.85454545 0.81818182 0.83636364 0.8        0.72727273
 0.85454545 0.85454545 0.83636364 0.74545455]

mean value: 0.7981818181818182

key: train_accuracy
value: [0.82626263 0.87070707 0.89090909 0.88282828 0.90707071 0.89292929
 0.89494949 0.88888889 0.88888889 0.89090909]

mean value: 0.8834343434343435

key: test_fscore
value: [0.71641791 0.86206897 0.79166667 0.81632653 0.8        0.69387755
 0.85714286 0.84615385 0.81632653 0.73076923]

mean value: 0.7930750088942501

key: train_fscore
value: [0.85017422 0.86086957 0.88311688 0.87336245 0.90212766 0.88602151
 0.8893617  0.88017429 0.88172043 0.88311688]

mean value: 0.8790045582018875

key: test_precision
value: [0.6        0.80645161 0.9047619  0.90909091 0.78571429 0.80952381
 0.85714286 0.91666667 0.95238095 0.79166667]

mean value: 0.8333399664851278

key: train_precision
value: [0.74846626 0.93396226 0.95327103 0.95238095 0.95495495 0.94495413
 0.93721973 0.95283019 0.94036697 0.94883721]

mean value: 0.9267243687033652

key: test_recall
value: [0.88888889 0.92592593 0.7037037  0.74074074 0.81481481 0.60714286
 0.85714286 0.78571429 0.71428571 0.67857143]

mean value: 0.7716931216931217

key: train_recall
value: [0.98387097 0.7983871  0.82258065 0.80645161 0.85483871 0.8340081
 0.84615385 0.81781377 0.82995951 0.82591093]

mean value: 0.8419975186104218

key: test_roc_auc
value: [0.65873016 0.85582011 0.81613757 0.83465608 0.80026455 0.72949735
 0.85449735 0.85582011 0.83862434 0.74669312]

mean value: 0.799074074074074

key: train_roc_auc
value: [0.82594358 0.87085347 0.89104741 0.88298289 0.90717644 0.8928105
 0.89485112 0.88874559 0.88877008 0.89077805]

mean value: 0.8833959122371686

key: test_jcc
value: [0.55813953 0.75757576 0.65517241 0.68965517 0.66666667 0.53125
 0.75       0.73333333 0.68965517 0.57575758]

mean value: 0.6607205626837744

key: train_jcc
value: [0.73939394 0.75572519 0.79069767 0.7751938  0.82170543 0.7953668
 0.80076628 0.78599222 0.78846154 0.79069767]

mean value: 0.7844000539129116

MCC on Blind test: 0.3

Accuracy on Blind test: 0.74

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02006578 0.03406477 0.0319221  0.02957749 0.03143239 0.0272727
 0.02576208 0.02969098 0.03094196 0.0119288 ]

mean value: 0.02726590633392334

key: score_time
value: [0.01922798 0.01971698 0.03047991 0.01091385 0.0175004  0.01788068
 0.02109575 0.01848054 0.02028871 0.0110805 ]

mean value: 0.018666529655456544

key: test_mcc
value: [0.63745526 0.78410665 0.56841568 0.89153439 0.78410665 0.8565805
 0.85695439 0.89602867 0.89139151 0.74569602]

mean value: 0.7912269728112449

key: train_mcc
value: [0.82706373 0.81437091 0.83515329 0.8265827  0.84258914 0.82682144
 0.81851887 0.81438908 0.81873585 0.81457838]

mean value: 0.8238803386905313

key: test_accuracy
value: [0.81818182 0.89090909 0.78181818 0.94545455 0.89090909 0.92727273
 0.92727273 0.94545455 0.94545455 0.87272727]

mean value: 0.8945454545454545

key: train_accuracy
value: [0.91313131 0.90707071 0.91717172 0.91313131 0.92121212 0.91313131
 0.90909091 0.90707071 0.90909091 0.90707071]

mean value: 0.9117171717171717

key: test_fscore
value: [0.80769231 0.89285714 0.76       0.94545455 0.89285714 0.93103448
 0.92592593 0.94915254 0.94736842 0.87719298]

mean value: 0.8929535493427339

key: train_fscore
value: [0.91518738 0.90836653 0.91913215 0.91451292 0.92215569 0.91451292
 0.91017964 0.908      0.91053678 0.90836653]

mean value: 0.9130950547952092

key: test_precision
value: [0.84       0.86206897 0.82608696 0.92857143 0.86206897 0.9
 0.96153846 0.90322581 0.93103448 0.86206897]

mean value: 0.8876664032393586

key: train_precision
value: [0.8957529  0.8976378  0.8996139  0.90196078 0.91304348 0.8984375
 0.8976378  0.8972332  0.89453125 0.89411765]

mean value: 0.8989966247132423

key: test_recall
value: [0.77777778 0.92592593 0.7037037  0.96296296 0.92592593 0.96428571
 0.89285714 1.         0.96428571 0.89285714]

mean value: 0.9010582010582011

key: train_recall
value: [0.93548387 0.91935484 0.93951613 0.92741935 0.93145161 0.93117409
 0.92307692 0.91902834 0.92712551 0.92307692]

mean value: 0.9276707587828131

key: test_roc_auc
value: [0.81746032 0.89153439 0.78042328 0.9457672  0.89153439 0.9265873
 0.92791005 0.94444444 0.94510582 0.8723545 ]

mean value: 0.8943121693121694

key: train_roc_auc
value: [0.91308607 0.90704584 0.91712649 0.91310239 0.92119139 0.91316769
 0.90911911 0.90709482 0.90912727 0.90710298]

mean value: 0.9117164032911061

key: test_jcc
value: [0.67741935 0.80645161 0.61290323 0.89655172 0.80645161 0.87096774
 0.86206897 0.90322581 0.9        0.78125   ]

mean value: 0.8117290044493882

key: train_jcc
value: [0.84363636 0.83211679 0.85036496 0.84249084 0.85555556 0.84249084
 0.83516484 0.83150183 0.83576642 0.83211679]

mean value: 0.840120523434392

MCC on Blind test: 0.25

Accuracy on Blind test: 0.71

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.1858604  0.14565659 0.11904073 0.20053315 0.21871543 0.19877052
 0.23351502 0.23374367 0.09664297 0.18243432]

mean value: 0.18149127960205078

key: score_time
value: [0.02734494 0.01139116 0.02203178 0.01164579 0.0204699  0.02237654
 0.02139163 0.02069759 0.01105142 0.02118349]

mean value: 0.018958425521850585

key: test_mcc
value: [0.67284827 0.78410665 0.64214885 0.89153439 0.78410665 0.8565805
 0.89153439 0.8565805  0.92724868 0.78353876]

mean value: 0.809022763950793

key: train_mcc
value: [0.8393547  0.81437091 0.86751154 0.85478898 0.85067196 0.86365469
 0.85916382 0.86308561 0.84716822 0.83883199]

mean value: 0.8498602425342839

key: test_accuracy
value: [0.83636364 0.89090909 0.81818182 0.94545455 0.89090909 0.92727273
 0.94545455 0.92727273 0.96363636 0.89090909]

mean value: 0.9036363636363636

key: train_accuracy
value: [0.91919192 0.90707071 0.93333333 0.92727273 0.92525253 0.93131313
 0.92929293 0.93131313 0.92323232 0.91919192]

mean value: 0.9246464646464646

key: test_fscore
value: [0.83018868 0.89285714 0.8        0.94545455 0.89285714 0.93103448
 0.94545455 0.93103448 0.96428571 0.89655172]

mean value: 0.9029718459809546

key: train_fscore
value: [0.92125984 0.90836653 0.93491124 0.92828685 0.9261477  0.93280632
 0.9304175  0.93227092 0.92460317 0.92031873]

mean value: 0.9259388811346168

key: test_precision
value: [0.84615385 0.86206897 0.86956522 0.92857143 0.86206897 0.9
 0.96296296 0.9        0.96428571 0.86666667]

mean value: 0.8962343767066405

key: train_precision
value: [0.9        0.8976378  0.91505792 0.91732283 0.91699605 0.91119691
 0.9140625  0.91764706 0.90661479 0.90588235]

mean value: 0.910241820136384

key: test_recall
value: [0.81481481 0.92592593 0.74074074 0.96296296 0.92592593 0.96428571
 0.92857143 0.96428571 0.96428571 0.92857143]

mean value: 0.912037037037037

key: train_recall
value: [0.94354839 0.91935484 0.95564516 0.93951613 0.93548387 0.95546559
 0.94736842 0.94736842 0.94331984 0.93522267]

mean value: 0.942229332636803

key: test_roc_auc
value: [0.83597884 0.89153439 0.81679894 0.9457672  0.89153439 0.9265873
 0.9457672  0.9265873  0.96362434 0.89021164]

mean value: 0.9034391534391535

key: train_roc_auc
value: [0.91914261 0.90704584 0.93328817 0.92724794 0.92523181 0.93136183
 0.92932937 0.9313455  0.92327282 0.91922424]

mean value: 0.9246490139741413

key: test_jcc
value: [0.70967742 0.80645161 0.66666667 0.89655172 0.80645161 0.87096774
 0.89655172 0.87096774 0.93103448 0.8125    ]

mean value: 0.8267820726733407

key: train_jcc
value: [0.8540146  0.83211679 0.87777778 0.866171   0.86245353 0.87407407
 0.86988848 0.87313433 0.8597786  0.85239852]

mean value: 0.8621807699995009

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.03266501 0.02544355 0.02698326 0.03438282 0.02959514 0.02234364
 0.02767587 0.02475405 0.0223968  0.02460122]

mean value: 0.02708413600921631

key: score_time
value: [0.01088929 0.01079488 0.01119494 0.01080465 0.01083374 0.01082158
 0.01083565 0.01080179 0.01084828 0.01076126]

mean value: 0.010858607292175294

key: test_mcc
value: [0.86189955 0.76689254 0.75462449 0.9321832  0.75047877 0.89342711
 0.85933785 0.82195294 0.71611487 0.82195294]

mean value: 0.8178864271069979

key: train_mcc
value: [0.83842049 0.85032927 0.83456039 0.8314851  0.84698856 0.80724303
 0.8154727  0.81912621 0.84698856 0.82718204]

mean value: 0.8317796354275596

key: test_accuracy
value: [0.92982456 0.87719298 0.87719298 0.96491228 0.875      0.94642857
 0.92857143 0.91071429 0.85714286 0.91071429]

mean value: 0.9077694235588972

key: train_accuracy
value: [0.91913215 0.92504931 0.91715976 0.91518738 0.92322835 0.90354331
 0.90748031 0.90944882 0.92322835 0.91338583]

mean value: 0.9156843560235444

key: test_fscore
value: [0.93103448 0.8852459  0.88135593 0.96428571 0.87719298 0.94545455
 0.92592593 0.90909091 0.86206897 0.9122807 ]

mean value: 0.9093936061086217

key: train_fscore
value: [0.92007797 0.92607004 0.91796875 0.91714836 0.9245648  0.90448343
 0.90909091 0.91050584 0.9245648  0.91472868]

mean value: 0.9169203576302117

key: test_precision
value: [0.9        0.81818182 0.86666667 1.         0.86206897 0.96296296
 0.96153846 0.92592593 0.83333333 0.89655172]

mean value: 0.9027229858264341

key: train_precision
value: [0.91119691 0.91538462 0.90733591 0.89473684 0.90874525 0.8957529
 0.89353612 0.9        0.90874525 0.90076336]

mean value: 0.90361971465238

key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.92857143
 0.89285714 0.89285714 0.89285714 0.92857143]

mean value: 0.918472906403941

key: train_recall
value: [0.92913386 0.93700787 0.92885375 0.94071146 0.94094488 0.91338583
 0.92519685 0.92125984 0.94094488 0.92913386]

mean value: 0.9306573091407052

key: test_roc_auc
value: [0.93041872 0.87869458 0.87684729 0.96551724 0.875      0.94642857
 0.92857143 0.91071429 0.85714286 0.91071429]

mean value: 0.9080049261083745

key: train_roc_auc
value: [0.91911238 0.92502568 0.91718278 0.91523762 0.92322835 0.90354331
 0.90748031 0.90944882 0.92322835 0.91338583]

mean value: 0.9156873424418785

key: test_jcc
value: [0.87096774 0.79411765 0.78787879 0.93103448 0.78125    0.89655172
 0.86206897 0.83333333 0.75757576 0.83870968]

mean value: 0.8353488117615334

key: train_jcc
value: [0.85198556 0.86231884 0.84837545 0.84697509 0.85971223 0.82562278
 0.83333333 0.83571429 0.85971223 0.84285714]

mean value: 0.8466606938515135

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.76443863 0.82875395 0.6898253  0.68586373 0.77263117 0.67486453
 0.67990541 0.83567691 0.67201447 0.71364689]

mean value: 0.7317620992660523

key: score_time
value: [0.01203632 0.01211286 0.0111289  0.01215601 0.01238728 0.02110219
 0.01246333 0.01241684 0.01234293 0.01244378]

mean value: 0.013059043884277343

key: test_mcc
value: [0.85960591 0.86189955 0.85960591 0.9321832  0.85933785 0.96490128
 0.96490128 0.85714286 0.82195294 0.89342711]

mean value: 0.8874957891973448

key: train_mcc
value: [0.94480322 0.92902382 0.93691156 0.93691156 0.95287407 0.93712408
 0.94491118 0.94095217 0.93703692 0.93703692]

mean value: 0.9397585529859461

key: test_accuracy
value: [0.92982456 0.92982456 0.92982456 0.96491228 0.92857143 0.98214286
 0.98214286 0.92857143 0.91071429 0.94642857]

mean value: 0.943295739348371

key: train_accuracy
value: [0.97238659 0.96449704 0.96844181 0.96844181 0.97637795 0.96850394
 0.97244094 0.97047244 0.96850394 0.96850394]

mean value: 0.9698570407988942

key: test_fscore
value: [0.92857143 0.93103448 0.93103448 0.96428571 0.93103448 0.98245614
 0.98181818 0.92857143 0.9122807  0.94545455]

mean value: 0.9436541589082423

key: train_fscore
value: [0.97233202 0.96442688 0.96825397 0.96825397 0.97619048 0.96825397
 0.97233202 0.9704142  0.96837945 0.96837945]

mean value: 0.9697216384507354

key: test_precision
value: [0.92857143 0.9        0.93103448 1.         0.9        0.96551724
 1.         0.92857143 0.89655172 0.96296296]

mean value: 0.9413209268381683

key: train_precision
value: [0.97619048 0.96825397 0.97211155 0.97211155 0.984      0.976
 0.97619048 0.97233202 0.97222222 0.97222222]

mean value: 0.9741634488459363

key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.96428571 1.
 0.96428571 0.92857143 0.92857143 0.92857143]

mean value: 0.9469211822660099

key: train_recall
value: [0.96850394 0.96062992 0.96442688 0.96442688 0.96850394 0.96062992
 0.96850394 0.96850394 0.96456693 0.96456693]

mean value: 0.9653263203759609

key: test_roc_auc
value: [0.92980296 0.93041872 0.92980296 0.96551724 0.92857143 0.98214286
 0.98214286 0.92857143 0.91071429 0.94642857]

mean value: 0.9434113300492611

key: train_roc_auc
value: [0.97239426 0.96450468 0.96843391 0.96843391 0.97637795 0.96850394
 0.97244094 0.97047244 0.96850394 0.96850394]

mean value: 0.9698569916902681

key: test_jcc
value: [0.86666667 0.87096774 0.87096774 0.93103448 0.87096774 0.96551724
 0.96428571 0.86666667 0.83870968 0.89655172]

mean value: 0.8942335399120717

key: train_jcc
value: [0.94615385 0.93129771 0.93846154 0.93846154 0.95348837 0.93846154
 0.94615385 0.94252874 0.93869732 0.93869732]

mean value: 0.9412401761356505

MCC on Blind test: 0.2

Accuracy on Blind test: 0.59

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01082301 0.01014185 0.00848889 0.00828099 0.00836778 0.00830722
 0.00840569 0.00863934 0.00839567 0.00852156]

mean value: 0.008837199211120606

key: score_time
value: [0.01088572 0.00903606 0.0087676  0.00838327 0.0087471  0.00870657
 0.00868726 0.00890255 0.00878382 0.00867343]

mean value: 0.008957338333129884

key: test_mcc
value: [0.50927421 0.65018988 0.7366424  0.64889453 0.65814518 0.58501794
 0.80439967 0.61706091 0.64951905 0.61706091]

mean value: 0.6476204667782233

key: train_mcc
value: [0.67420459 0.6683308  0.66925612 0.67734922 0.69555499 0.6527166
 0.65044798 0.70356186 0.67461719 0.68157216]

mean value: 0.6747611503976669

key: test_accuracy
value: [0.75438596 0.8245614  0.85964912 0.80701754 0.82142857 0.78571429
 0.89285714 0.80357143 0.82142857 0.80357143]

mean value: 0.8174185463659148

key: train_accuracy
value: [0.82840237 0.82642998 0.82642998 0.83234714 0.84251969 0.81889764
 0.81692913 0.8484252  0.83070866 0.83464567]

mean value: 0.830573545170759

key: test_fscore
value: [0.74074074 0.81481481 0.84615385 0.7755102  0.8        0.76
 0.88       0.78431373 0.80769231 0.78431373]

mean value: 0.7993539364463734

key: train_fscore
value: [0.80709534 0.8061674  0.80444444 0.81400438 0.82758621 0.79735683
 0.79379157 0.8372093  0.81222707 0.8173913 ]

mean value: 0.8117273855652805

key: test_precision
value: [0.76923077 0.84615385 0.95652174 0.95       0.90909091 0.86363636
 1.         0.86956522 0.875      0.86956522]

mean value: 0.8908764062024932

key: train_precision
value: [0.92385787 0.915      0.91878173 0.91176471 0.91428571 0.905
 0.90862944 0.90410959 0.91176471 0.91262136]

mean value: 0.9125815109847812

key: test_recall
value: [0.71428571 0.78571429 0.75862069 0.65517241 0.71428571 0.67857143
 0.78571429 0.71428571 0.75       0.71428571]

mean value: 0.7270935960591133

key: train_recall
value: [0.71653543 0.72047244 0.71541502 0.73517787 0.75590551 0.71259843
 0.70472441 0.77952756 0.73228346 0.74015748]

mean value: 0.7312797609784942

key: test_roc_auc
value: [0.75369458 0.82389163 0.8614532  0.80972906 0.82142857 0.78571429
 0.89285714 0.80357143 0.82142857 0.80357143]

mean value: 0.8177339901477833

key: train_roc_auc
value: [0.82862345 0.82663938 0.82621145 0.83215586 0.84251969 0.81889764
 0.81692913 0.8484252  0.83070866 0.83464567]

mean value: 0.8305756123369954

key: test_jcc
value: [0.58823529 0.6875     0.73333333 0.63333333 0.66666667 0.61290323
 0.78571429 0.64516129 0.67741935 0.64516129]

mean value: 0.6675428074455588

key: train_jcc
value: [0.67657993 0.67527675 0.67286245 0.68634686 0.70588235 0.66300366
 0.65808824 0.72       0.68382353 0.69117647]

mean value: 0.6833040246657276

MCC on Blind test: 0.34

Accuracy on Blind test: 0.78

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00908518 0.00873065 0.00850201 0.00850201 0.00844526 0.00834846
 0.00829315 0.00879455 0.00849795 0.00861597]

mean value: 0.00858151912689209

key: score_time
value: [0.00892878 0.0087173  0.00866246 0.00867057 0.00865889 0.00869465
 0.0087862  0.00887084 0.00837755 0.00877047]

mean value: 0.008713769912719726

key: test_mcc
value: [0.79778885 0.72706729 0.79110556 0.66755025 0.71611487 0.78772636
 0.79385662 0.75047877 0.67900461 0.75047877]

mean value: 0.7461171974035183

key: train_mcc
value: [0.77122271 0.76334013 0.76731664 0.68276748 0.78361641 0.76800824
 0.76819892 0.77588525 0.78361641 0.77574087]

mean value: 0.763971305717051

key: test_accuracy
value: [0.89473684 0.85964912 0.89473684 0.80701754 0.85714286 0.89285714
 0.89285714 0.875      0.83928571 0.875     ]

mean value: 0.868828320802005

key: train_accuracy
value: [0.88560158 0.8816568  0.88362919 0.84023669 0.89173228 0.88385827
 0.88385827 0.88779528 0.89173228 0.88779528]

mean value: 0.8817895913898336

key: test_fscore
value: [0.9        0.86666667 0.9        0.76595745 0.86206897 0.88888889
 0.88461538 0.87272727 0.84210526 0.87719298]

mean value: 0.8660222870838

key: train_fscore
value: [0.88627451 0.88142292 0.88408644 0.83298969 0.89278752 0.88543689
 0.88588008 0.88932039 0.89278752 0.88888889]

mean value: 0.8819874865979285

key: test_precision
value: [0.84375    0.8125     0.87096774 1.         0.83333333 0.92307692
 0.95833333 0.88888889 0.82758621 0.86206897]

mean value: 0.8820505392981756

key: train_precision
value: [0.8828125  0.88492063 0.87890625 0.87068966 0.88416988 0.87356322
 0.87072243 0.87739464 0.88416988 0.88030888]

mean value: 0.8787657976607903

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.62068966 0.89285714 0.85714286
 0.82142857 0.85714286 0.85714286 0.89285714]

mean value: 0.8623152709359606

key: train_recall
value: [0.88976378 0.87795276 0.88932806 0.79841897 0.9015748  0.8976378
 0.9015748  0.9015748  0.9015748  0.8976378 ]

mean value: 0.8857038374155799

key: test_roc_auc
value: [0.89593596 0.86083744 0.89408867 0.81034483 0.85714286 0.89285714
 0.89285714 0.875      0.83928571 0.875     ]

mean value: 0.8693349753694581

key: train_roc_auc
value: [0.88559335 0.88166412 0.88364041 0.84015437 0.89173228 0.88385827
 0.88385827 0.88779528 0.89173228 0.88779528]

mean value: 0.881782390837509

key: test_jcc
value: [0.81818182 0.76470588 0.81818182 0.62068966 0.75757576 0.8
 0.79310345 0.77419355 0.72727273 0.78125   ]

mean value: 0.7655154655400436

key: train_jcc
value: [0.79577465 0.78798587 0.79225352 0.71378092 0.80633803 0.79442509
 0.79513889 0.8006993  0.80633803 0.8       ]

mean value: 0.7892734286500613

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.0078218  0.00819182 0.00788093 0.00713396 0.00716805 0.00795126
 0.00721955 0.00793576 0.00724411 0.00745344]

mean value: 0.007600069046020508

key: score_time
value: [0.01292777 0.01269674 0.01303506 0.01127744 0.01387811 0.01285839
 0.01185989 0.0118773  0.0108037  0.01251173]

mean value: 0.012372612953186035

key: test_mcc
value: [0.72706729 0.68850906 0.71921182 0.8953202  0.71611487 0.68250015
 0.79385662 0.75047877 0.67900461 0.75047877]

mean value: 0.7402542168196266

key: train_mcc
value: [0.79496359 0.79887642 0.78334713 0.77932046 0.79951627 0.76777009
 0.79936749 0.79163927 0.80324922 0.79530025]

mean value: 0.7913350175140176

key: test_accuracy
value: [0.85964912 0.84210526 0.85964912 0.94736842 0.85714286 0.83928571
 0.89285714 0.875      0.83928571 0.875     ]

mean value: 0.868734335839599

key: train_accuracy
value: [0.8974359  0.89940828 0.89151874 0.88954635 0.8996063  0.88385827
 0.8996063  0.89566929 0.9015748  0.8976378 ]

mean value: 0.8955862026122474

key: test_fscore
value: [0.86666667 0.84745763 0.86206897 0.94736842 0.86206897 0.83018868
 0.88461538 0.87272727 0.84210526 0.87719298]

mean value: 0.86924602280744

key: train_fscore
value: [0.8984375  0.8990099  0.89278752 0.890625   0.90097087 0.88454012
 0.9005848  0.89708738 0.90234375 0.89803922]

mean value: 0.8964426056208497

key: test_precision
value: [0.8125     0.80645161 0.86206897 0.96428571 0.83333333 0.88
 0.95833333 0.88888889 0.82758621 0.86206897]

mean value: 0.869551702067553

key: train_precision
value: [0.89147287 0.90438247 0.88076923 0.88030888 0.88888889 0.87937743
 0.89189189 0.88505747 0.89534884 0.89453125]

mean value: 0.8892029220575753

key: test_recall
value: [0.92857143 0.89285714 0.86206897 0.93103448 0.89285714 0.78571429
 0.82142857 0.85714286 0.85714286 0.89285714]

mean value: 0.8721674876847291

key: train_recall
value: [0.90551181 0.89370079 0.90513834 0.90118577 0.91338583 0.88976378
 0.90944882 0.90944882 0.90944882 0.9015748 ]

mean value: 0.9038607575238866

key: test_roc_auc
value: [0.86083744 0.8429803  0.85960591 0.9476601  0.85714286 0.83928571
 0.89285714 0.875      0.83928571 0.875     ]

mean value: 0.8689655172413794

key: train_roc_auc
value: [0.89741994 0.89941956 0.89154555 0.88956926 0.8996063  0.88385827
 0.8996063  0.89566929 0.9015748  0.8976378 ]

mean value: 0.8955907067940618

key: test_jcc
value: [0.76470588 0.73529412 0.75757576 0.9        0.75757576 0.70967742
 0.79310345 0.77419355 0.72727273 0.78125   ]

mean value: 0.770064865844204

key: train_jcc
value: [0.81560284 0.81654676 0.80633803 0.8028169  0.81978799 0.79298246
 0.81914894 0.81338028 0.82206406 0.81494662]

mean value: 0.8123614865069838

MCC on Blind test: 0.25

Accuracy on Blind test: 0.72

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01474333 0.01426578 0.01444244 0.01453424 0.01460671 0.01486588
 0.01472378 0.01466084 0.01448703 0.01444364]

mean value: 0.014577364921569825

key: score_time
value: [0.00919628 0.00896859 0.00908899 0.00891948 0.00902748 0.00907803
 0.00912905 0.00929427 0.00912786 0.00900006]

mean value: 0.009083008766174317

key: test_mcc
value: [0.82942474 0.76689254 0.79110556 0.89988258 0.71611487 0.78772636
 0.79385662 0.78772636 0.67900461 0.71428571]

mean value: 0.776601995146589

key: train_mcc
value: [0.78308641 0.79093074 0.78708603 0.77160078 0.79537422 0.77974514
 0.78395685 0.78779242 0.79537422 0.78351922]

mean value: 0.7858466034660538

key: test_accuracy
value: [0.9122807  0.87719298 0.89473684 0.94736842 0.85714286 0.89285714
 0.89285714 0.89285714 0.83928571 0.85714286]

mean value: 0.8863721804511278

key: train_accuracy
value: [0.89151874 0.89546351 0.89349112 0.88560158 0.8976378  0.88976378
 0.89173228 0.89370079 0.8976378  0.89173228]

mean value: 0.8928279675099784

key: test_fscore
value: [0.91525424 0.8852459  0.9        0.94545455 0.86206897 0.88888889
 0.88461538 0.88888889 0.84210526 0.85714286]

mean value: 0.8869664932593181

key: train_fscore
value: [0.89236791 0.89587426 0.89411765 0.88715953 0.8984375  0.89105058
 0.89361702 0.89534884 0.8984375  0.89236791]

mean value: 0.8938778697670609

key: test_precision
value: [0.87096774 0.81818182 0.87096774 1.         0.83333333 0.92307692
 0.95833333 0.92307692 0.82758621 0.85714286]

mean value: 0.8882666878912708

key: train_precision
value: [0.88715953 0.89411765 0.88715953 0.87356322 0.89147287 0.88076923
 0.878327   0.88167939 0.89147287 0.88715953]

mean value: 0.8852880817385453

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
 0.82142857 0.85714286 0.85714286 0.85714286]

mean value: 0.8899014778325123

key: train_recall
value: [0.8976378  0.8976378  0.90118577 0.90118577 0.90551181 0.9015748
 0.90944882 0.90944882 0.90551181 0.8976378 ]

mean value: 0.9026780990320874

key: test_roc_auc
value: [0.91317734 0.87869458 0.89408867 0.94827586 0.85714286 0.89285714
 0.89285714 0.89285714 0.83928571 0.85714286]

mean value: 0.8866379310344827

key: train_roc_auc
value: [0.89150664 0.89545921 0.89350627 0.88563226 0.8976378  0.88976378
 0.89173228 0.89370079 0.8976378  0.89173228]

mean value: 0.8928309109582646

key: test_jcc
value: [0.84375    0.79411765 0.81818182 0.89655172 0.75757576 0.8
 0.79310345 0.8        0.72727273 0.75      ]

mean value: 0.798055312250292

key: train_jcc
value: [0.80565371 0.8113879  0.80851064 0.7972028  0.81560284 0.80350877
 0.80769231 0.81052632 0.81560284 0.80565371]

mean value: 0.8081341825521712

MCC on Blind test: 0.22

Accuracy on Blind test: 0.71

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.39773488 1.62088108 1.47775269 1.50360155 1.53373218 1.47553325
 1.55788469 1.53873181 1.53706622 1.49211836]

mean value: 1.5135036706924438

key: score_time
value: [0.01128125 0.01326585 0.01378894 0.01334476 0.01340508 0.01640296
 0.01380563 0.01374269 0.0163722  0.01363611]

mean value: 0.013904547691345215

key: test_mcc
value: [0.8951918  0.86189955 0.82880708 0.82490815 0.78772636 0.85933785
 0.96490128 0.85714286 0.82195294 0.85714286]

mean value: 0.8559010729919259

key: train_mcc
value: [0.96067294 0.96450468 0.96847232 0.96844169 0.9645744  0.9645744
 0.9645744  0.97244848 0.9606597  0.98032256]

mean value: 0.9669245592847295

key: test_accuracy
value: [0.94736842 0.92982456 0.9122807  0.9122807  0.89285714 0.92857143
 0.98214286 0.92857143 0.91071429 0.92857143]

mean value: 0.9273182957393483

key: train_accuracy
value: [0.98027613 0.98224852 0.98422091 0.98422091 0.98228346 0.98228346
 0.98228346 0.98622047 0.98031496 0.99015748]

mean value: 0.9834509776514623

key: test_fscore
value: [0.94545455 0.93103448 0.91803279 0.91525424 0.89655172 0.92592593
 0.98181818 0.92857143 0.9122807  0.92857143]

mean value: 0.928349544316583

key: train_fscore
value: [0.98015873 0.98224852 0.98425197 0.98418972 0.98224852 0.98224852
 0.98224852 0.98619329 0.98023715 0.99017682]

mean value: 0.9834201770147663

key: test_precision
value: [0.96296296 0.9        0.875      0.9        0.86666667 0.96153846
 1.         0.92857143 0.89655172 0.92857143]

mean value: 0.921986267244888

key: train_precision
value: [0.988      0.98418972 0.98039216 0.98418972 0.98418972 0.98418972
 0.98418972 0.98814229 0.98412698 0.98823529]

mean value: 0.9849845344198285

key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.92857143 0.89285714
 0.96428571 0.92857143 0.92857143 0.92857143]

mean value: 0.9360837438423646

key: train_recall
value: [0.97244094 0.98031496 0.98814229 0.98418972 0.98031496 0.98031496
 0.98031496 0.98425197 0.97637795 0.99212598]

mean value: 0.9818788708723662

key: test_roc_auc
value: [0.94704433 0.93041872 0.91133005 0.91194581 0.89285714 0.92857143
 0.98214286 0.92857143 0.91071429 0.92857143]

mean value: 0.9272167487684729

key: train_roc_auc
value: [0.98029162 0.98225234 0.98422863 0.98422085 0.98228346 0.98228346
 0.98228346 0.98622047 0.98031496 0.99015748]

mean value: 0.9834536740219726

key: test_jcc
value: [0.89655172 0.87096774 0.84848485 0.84375    0.8125     0.86206897
 0.96428571 0.86666667 0.83870968 0.86666667]

mean value: 0.8670652005113907

key: train_jcc
value: [0.96108949 0.96511628 0.96899225 0.9688716  0.96511628 0.96511628
 0.96511628 0.97276265 0.96124031 0.98054475]

mean value: 0.9673966156908878

MCC on Blind test: 0.26

Accuracy on Blind test: 0.66

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.0136857  0.01279187 0.01128888 0.01076555 0.01062655 0.01041436
 0.01054716 0.01079988 0.0110817  0.01173425]

mean value: 0.011373591423034669

key: score_time
value: [0.01080513 0.00837135 0.00845194 0.00823951 0.0084095  0.0082202
 0.00809073 0.00809741 0.00841331 0.0083375 ]

mean value: 0.008543658256530761

key: test_mcc
value: [0.92980296 0.8953202  0.82942474 0.96551724 0.75047877 0.89342711
 0.89342711 0.85933785 0.96490128 0.92857143]

mean value: 0.891020869070053

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.94736842 0.9122807  0.98245614 0.875      0.94642857
 0.94642857 0.92857143 0.98214286 0.96428571]

mean value: 0.9449874686716792

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.94736842 0.90909091 0.98245614 0.87719298 0.94736842
 0.94545455 0.92592593 0.98181818 0.96428571]

mean value: 0.9445246955773272

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.93103448 0.96153846 1.         0.86206897 0.93103448
 0.96296296 0.96153846 1.         0.96428571]

mean value: 0.9538749245645797

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.86206897 0.96551724 0.89285714 0.96428571
 0.92857143 0.89285714 0.96428571 0.96428571]

mean value: 0.9363300492610838

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96490148 0.9476601  0.91317734 0.98275862 0.875      0.94642857
 0.94642857 0.92857143 0.98214286 0.96428571]

mean value: 0.9451354679802957

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.9        0.83333333 0.96551724 0.78125    0.9
 0.89655172 0.86206897 0.96428571 0.93103448]

mean value: 0.8965075944170772

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.11

Accuracy on Blind test: 0.36

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10608721 0.10524035 0.10447693 0.1020844  0.10217381 0.10169768
 0.10213351 0.1026423  0.10438013 0.10096812]

mean value: 0.10318844318389893

key: score_time
value: [0.01834702 0.01700187 0.01766229 0.0172255  0.01845121 0.0170753
 0.0172298  0.01727653 0.01692057 0.01812077]

mean value: 0.01753108501434326

key: test_mcc
value: [0.82942474 0.86189955 0.8615634  0.8953202  0.78772636 0.93094934
 0.89802651 0.78772636 0.75047877 0.85933785]

mean value: 0.84624530759693

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9122807  0.92982456 0.92982456 0.94736842 0.89285714 0.96428571
 0.94642857 0.89285714 0.875      0.92857143]

mean value: 0.9219298245614035

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.91525424 0.93103448 0.93333333 0.94736842 0.89655172 0.96296296
 0.94339623 0.88888889 0.87719298 0.93103448]

mean value: 0.9227017742052359

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.87096774 0.9        0.90322581 0.96428571 0.86666667 1.
 1.         0.92307692 0.86206897 0.9       ]

mean value: 0.9190291817933642

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.93103448 0.92857143 0.92857143
 0.89285714 0.85714286 0.89285714 0.96428571]

mean value: 0.9289408866995074

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.91317734 0.93041872 0.92918719 0.9476601  0.89285714 0.96428571
 0.94642857 0.89285714 0.875      0.92857143]

mean value: 0.9220443349753695

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.84375    0.87096774 0.875      0.9        0.8125     0.92857143
 0.89285714 0.8        0.78125    0.87096774]

mean value: 0.8575864055299539

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.33

Accuracy on Blind test: 0.71

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00839448 0.00779438 0.00789499 0.00782299 0.00828099 0.00764513
 0.0078218  0.00805545 0.00872326 0.00792527]

mean value: 0.008035874366760254

key: score_time
value: [0.0083375  0.00801182 0.00785446 0.0080626  0.0083189  0.00805974
 0.00803876 0.00792432 0.00809288 0.00801706]

mean value: 0.008071804046630859

key: test_mcc
value: [0.79161589 0.68850906 0.72133224 0.54592083 0.4645821  0.61065803
 0.79385662 0.68250015 0.64285714 0.62705445]

mean value: 0.65688865057222

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.89473684 0.84210526 0.85964912 0.77192982 0.73214286 0.80357143
 0.89285714 0.83928571 0.82142857 0.80357143]

mean value: 0.8261278195488722

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.89655172 0.84745763 0.85714286 0.78688525 0.72727273 0.79245283
 0.88461538 0.83018868 0.82142857 0.7755102 ]

mean value: 0.821950585113335

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.86666667 0.80645161 0.88888889 0.75       0.74074074 0.84
 0.95833333 0.88       0.82142857 0.9047619 ]

mean value: 0.8457271718723331

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.92857143 0.89285714 0.82758621 0.82758621 0.71428571 0.75
 0.82142857 0.78571429 0.82142857 0.67857143]

mean value: 0.8048029556650247

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.8953202  0.8429803  0.86022167 0.77093596 0.73214286 0.80357143
 0.89285714 0.83928571 0.82142857 0.80357143]

mean value: 0.826231527093596

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8125     0.73529412 0.75       0.64864865 0.57142857 0.65625
 0.79310345 0.70967742 0.6969697  0.63333333]

mean value: 0.7007205235658011

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.71

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.3137598  1.31105161 1.31397271 1.34022093 1.33495617 1.32320976
 1.32298803 1.31964326 1.33588552 1.33314967]

mean value: 1.3248837471008301

key: score_time
value: [0.09023738 0.0960989  0.0929544  0.09687686 0.09749436 0.092448
 0.09450769 0.09734035 0.09717226 0.09090662]

mean value: 0.09460368156433105

key: test_mcc
value: [0.92980296 0.92980296 0.8951918  0.9321832  0.85933785 0.96490128
 0.96490128 0.92857143 0.89342711 0.89342711]

mean value: 0.9191546978182543

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.96491228 0.94736842 0.96491228 0.92857143 0.98214286
 0.98214286 0.96428571 0.94642857 0.94642857]

mean value: 0.9592105263157895

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.96428571 0.94915254 0.96428571 0.93103448 0.98245614
 0.98181818 0.96428571 0.94545455 0.94736842]

mean value: 0.9594427170950596

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.96428571 0.93333333 1.         0.9        0.96551724
 1.         0.96428571 0.96296296 0.93103448]

mean value: 0.958570516329137

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.93103448 0.96428571 1.
 0.96428571 0.96428571 0.92857143 0.96428571]

mean value: 0.9610837438423645

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96490148 0.96490148 0.94704433 0.96551724 0.92857143 0.98214286
 0.98214286 0.96428571 0.94642857 0.94642857]

mean value: 0.9592364532019705

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.93103448 0.90322581 0.93103448 0.87096774 0.96551724
 0.96428571 0.93103448 0.89655172 0.9       ]

mean value: 0.9224686159224535

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.2

Accuracy on Blind test: 0.5

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.87169266 0.93242407 0.9023416  1.00182915 0.90367198 0.91859269
 0.90991735 0.90380979 0.88591456 0.89217138]

mean value: 0.9122365236282348

key: score_time
value: [0.22929215 0.26355243 0.24770474 0.25745416 0.18990588 0.25744367
 0.27588391 0.26097345 0.2340591  0.21919918]

mean value: 0.24354686737060546

key: test_mcc
value: [0.8953202  0.92980296 0.8951918  0.9321832  0.85933785 0.96490128
 0.96490128 0.96490128 0.89342711 0.89342711]

mean value: 0.919339407234444

key: train_mcc
value: [0.95679178 0.94890036 0.94878539 0.94089544 0.9606597  0.94900279
 0.94112724 0.94499908 0.94888508 0.95687833]

mean value: 0.9496925191019135

key: test_accuracy
value: [0.94736842 0.96491228 0.94736842 0.96491228 0.92857143 0.98214286
 0.98214286 0.98214286 0.94642857 0.94642857]

mean value: 0.9592418546365915

key: train_accuracy
value: [0.97830375 0.97435897 0.97435897 0.9704142  0.98031496 0.97440945
 0.97047244 0.97244094 0.97440945 0.97834646]

mean value: 0.9747829598223299

key: test_fscore
value: [0.94736842 0.96428571 0.94915254 0.96428571 0.93103448 0.98245614
 0.98181818 0.98181818 0.94545455 0.94736842]

mean value: 0.959504234524998

key: train_fscore
value: [0.9785575  0.97465887 0.97445972 0.97053045 0.98039216 0.97465887
 0.97076023 0.97265625 0.97455969 0.9785575 ]

mean value: 0.9749791253024628

key: test_precision
value: [0.93103448 0.96428571 0.93333333 1.         0.9        0.96551724
 1.         1.         0.96296296 0.93103448]

mean value: 0.9588168217478562

key: train_precision
value: [0.96911197 0.96525097 0.96875    0.96484375 0.9765625  0.96525097
 0.96138996 0.96511628 0.9688716  0.96911197]

mean value: 0.9674259954516337

key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.93103448 0.96428571 1.
 0.96428571 0.96428571 0.92857143 0.96428571]

mean value: 0.9610837438423645

key: train_recall
value: [0.98818898 0.98425197 0.98023715 0.97628458 0.98425197 0.98425197
 0.98031496 0.98031496 0.98031496 0.98818898]

mean value: 0.9826600479287916

key: test_roc_auc
value: [0.9476601  0.96490148 0.94704433 0.96551724 0.92857143 0.98214286
 0.98214286 0.98214286 0.94642857 0.94642857]

mean value: 0.9592980295566503

key: train_roc_auc
value: [0.97828421 0.97433942 0.97437055 0.97042576 0.98031496 0.97440945
 0.97047244 0.97244094 0.97440945 0.97834646]

mean value: 0.9747813637919767

key: test_jcc
value: [0.9        0.93103448 0.90322581 0.93103448 0.87096774 0.96551724
 0.96428571 0.96428571 0.89655172 0.9       ]

mean value: 0.9226902907993009

key: train_jcc
value: [0.95801527 0.95057034 0.95019157 0.94274809 0.96153846 0.95057034
 0.94318182 0.94676806 0.95038168 0.95801527]

mean value: 0.9511980901192165

MCC on Blind test: 0.2

Accuracy on Blind test: 0.5

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02086926 0.00807834 0.00846243 0.00790358 0.00789189 0.0084095
 0.00794506 0.00821877 0.00814724 0.00783157]

mean value: 0.009375762939453126

key: score_time
value: [0.01121163 0.00862956 0.00874233 0.00838256 0.00837135 0.00809836
 0.00858903 0.00843906 0.00876284 0.00867867]

mean value: 0.008790540695190429

key: test_mcc
value: [0.79778885 0.72706729 0.79110556 0.66755025 0.71611487 0.78772636
 0.79385662 0.75047877 0.67900461 0.75047877]

mean value: 0.7461171974035183

key: train_mcc
value: [0.77122271 0.76334013 0.76731664 0.68276748 0.78361641 0.76800824
 0.76819892 0.77588525 0.78361641 0.77574087]

mean value: 0.763971305717051

key: test_accuracy
value: [0.89473684 0.85964912 0.89473684 0.80701754 0.85714286 0.89285714
 0.89285714 0.875      0.83928571 0.875     ]

mean value: 0.868828320802005

key: train_accuracy
value: [0.88560158 0.8816568  0.88362919 0.84023669 0.89173228 0.88385827
 0.88385827 0.88779528 0.89173228 0.88779528]

mean value: 0.8817895913898336

key: test_fscore
value: [0.9        0.86666667 0.9        0.76595745 0.86206897 0.88888889
 0.88461538 0.87272727 0.84210526 0.87719298]

mean value: 0.8660222870838

key: train_fscore
value: [0.88627451 0.88142292 0.88408644 0.83298969 0.89278752 0.88543689
 0.88588008 0.88932039 0.89278752 0.88888889]

mean value: 0.8819874865979285

key: test_precision
value: [0.84375    0.8125     0.87096774 1.         0.83333333 0.92307692
 0.95833333 0.88888889 0.82758621 0.86206897]

mean value: 0.8820505392981756

key: train_precision
value: [0.8828125  0.88492063 0.87890625 0.87068966 0.88416988 0.87356322
 0.87072243 0.87739464 0.88416988 0.88030888]

mean value: 0.8787657976607903

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.62068966 0.89285714 0.85714286
 0.82142857 0.85714286 0.85714286 0.89285714]

mean value: 0.8623152709359606

key: train_recall
value: [0.88976378 0.87795276 0.88932806 0.79841897 0.9015748  0.8976378
 0.9015748  0.9015748  0.9015748  0.8976378 ]

mean value: 0.8857038374155799

key: test_roc_auc
value: [0.89593596 0.86083744 0.89408867 0.81034483 0.85714286 0.89285714
 0.89285714 0.875      0.83928571 0.875     ]

mean value: 0.8693349753694581

key: train_roc_auc
value: [0.88559335 0.88166412 0.88364041 0.84015437 0.89173228 0.88385827
 0.88385827 0.88779528 0.89173228 0.88779528]

mean value: 0.881782390837509

key: test_jcc
value: [0.81818182 0.76470588 0.81818182 0.62068966 0.75757576 0.8
 0.79310345 0.77419355 0.72727273 0.78125   ]

mean value: 0.7655154655400436

key: train_jcc
value: [0.79577465 0.78798587 0.79225352 0.71378092 0.80633803 0.79442509
 0.79513889 0.8006993  0.80633803 0.8       ]

mean value: 0.7892734286500613

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.06513762 0.05543566 0.05926275 0.05869985 0.05539632 0.05809283
 0.05878782 0.06239796 0.0591898  0.21499252]

mean value: 0.07473931312561036

key: score_time
value: [0.01001787 0.00966692 0.00963521 0.00965786 0.0098114  0.0097878
 0.00974226 0.00981474 0.00976157 0.01011968]

mean value: 0.009801530838012695

key: test_mcc
value: [0.92980296 0.92980296 0.92980296 0.96551724 0.82618439 0.93094934
 1.         0.92857143 0.96490128 0.89342711]

mean value: 0.9298959656239084

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.96491228 0.96491228 0.98245614 0.91071429 0.96428571
 1.         0.96428571 0.98214286 0.94642857]

mean value: 0.9645050125313284

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.96428571 0.96551724 0.98245614 0.91525424 0.96551724
 1.         0.96428571 0.98181818 0.94736842]

mean value: 0.965078860612559

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.96428571 0.96551724 1.         0.87096774 0.93333333
 1.         0.96428571 1.         0.93103448]

mean value: 0.9593709942263892

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.96428571 1.
 1.         0.96428571 0.96428571 0.96428571]

mean value: 0.9716748768472907

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96490148 0.96490148 0.96490148 0.98275862 0.91071429 0.96428571
 1.         0.96428571 0.98214286 0.94642857]

mean value: 0.9645320197044336

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.93103448 0.93333333 0.96551724 0.84375    0.93333333
 1.         0.93103448 0.96428571 0.9       ]

mean value: 0.9333323070607553

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.37

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.0158906  0.04133749 0.04191256 0.04180241 0.04169393 0.04146218
 0.04147553 0.03954792 0.04155207 0.04123378]

mean value: 0.03879084587097168

key: score_time
value: [0.010324   0.01102901 0.0110786  0.01924562 0.02003503 0.02148247
 0.01090479 0.02042723 0.02186942 0.01970553]

mean value: 0.016610169410705568

key: test_mcc
value: [0.82512315 0.76689254 0.79110556 0.9321832  0.75434227 0.82195294
 0.89802651 0.85933785 0.67900461 0.82195294]

mean value: 0.8149921569819407

key: train_mcc
value: [0.87014673 0.87419439 0.85823465 0.85931426 0.87499279 0.85486752
 0.83910959 0.86274648 0.87089581 0.85105352]

mean value: 0.8615555753216068

key: test_accuracy
value: [0.9122807  0.87719298 0.89473684 0.96491228 0.875      0.91071429
 0.94642857 0.92857143 0.83928571 0.91071429]

mean value: 0.905983709273183

key: train_accuracy
value: [0.93491124 0.93688363 0.92899408 0.92899408 0.93700787 0.92716535
 0.91929134 0.93110236 0.93503937 0.92519685]

mean value: 0.9304586187081645

key: test_fscore
value: [0.9122807  0.8852459  0.9        0.96428571 0.88135593 0.90909091
 0.94339623 0.92592593 0.84210526 0.90909091]

mean value: 0.9072777483563568

key: train_fscore
value: [0.93592233 0.9379845  0.9296875  0.93076923 0.93846154 0.92843327
 0.92069632 0.93230174 0.93641618 0.92664093]

mean value: 0.9317313541686736

key: test_precision
value: [0.89655172 0.81818182 0.87096774 1.         0.83870968 0.92592593
 1.         0.96153846 0.82758621 0.92592593]

mean value: 0.9065387481961453

key: train_precision
value: [0.92337165 0.92366412 0.91891892 0.90636704 0.91729323 0.91254753
 0.90494297 0.91634981 0.91698113 0.90909091]

mean value: 0.9149527308196

key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
 0.89285714 0.89285714 0.85714286 0.89285714]

mean value: 0.9112068965517242

key: train_recall
value: [0.9488189  0.95275591 0.94071146 0.95652174 0.96062992 0.94488189
 0.93700787 0.9488189  0.95669291 0.94488189]

mean value: 0.9491721390557406

key: test_roc_auc
value: [0.91256158 0.87869458 0.89408867 0.96551724 0.875      0.91071429
 0.94642857 0.92857143 0.83928571 0.91071429]

mean value: 0.9061576354679803

key: train_roc_auc
value: [0.93488376 0.93685226 0.92901715 0.92904827 0.93700787 0.92716535
 0.91929134 0.93110236 0.93503937 0.92519685]

mean value: 0.9304604587470044

key: test_jcc
value: [0.83870968 0.79411765 0.81818182 0.93103448 0.78787879 0.83333333
 0.89285714 0.86206897 0.72727273 0.83333333]

mean value: 0.8318787915611183

key: train_jcc
value: [0.87956204 0.88321168 0.86861314 0.8705036  0.88405797 0.86642599
 0.85304659 0.87318841 0.88043478 0.86330935]

mean value: 0.8722353558136309

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01021671 0.01008439 0.00848007 0.00817442 0.00810814 0.00807619
 0.00805545 0.00815272 0.00819159 0.00817704]

mean value: 0.008571672439575195

key: score_time
value: [0.01060987 0.00964761 0.0087111  0.00849438 0.00838518 0.00844193
 0.00843549 0.00848222 0.0084908  0.00846648]

mean value: 0.00881650447845459

key: test_mcc
value: [0.79778885 0.72706729 0.79110556 0.89988258 0.71611487 0.78772636
 0.79385662 0.75047877 0.67900461 0.75047877]

mean value: 0.7693504297544995

key: train_mcc
value: [0.76726164 0.78700923 0.77122983 0.7514861  0.77955173 0.77186893
 0.77203657 0.77574087 0.78351922 0.77174925]

mean value: 0.7731453348144388

key: test_accuracy
value: [0.89473684 0.85964912 0.89473684 0.94736842 0.85714286 0.89285714
 0.89285714 0.875      0.83928571 0.875     ]

mean value: 0.8828634085213033

key: train_accuracy
value: [0.88362919 0.89349112 0.88560158 0.87573964 0.88976378 0.88582677
 0.88582677 0.88779528 0.89173228 0.88582677]

mean value: 0.8865233192004845

key: test_fscore
value: [0.9        0.86666667 0.9        0.94545455 0.86206897 0.88888889
 0.88461538 0.87272727 0.84210526 0.87719298]

mean value: 0.8839719969484034

key: train_fscore
value: [0.88408644 0.89328063 0.88582677 0.87573964 0.89019608 0.88715953
 0.8875969  0.88888889 0.89236791 0.88671875]

mean value: 0.8871861548728417

key: test_precision
value: [0.84375    0.8125     0.87096774 1.         0.83333333 0.92307692
 0.95833333 0.88888889 0.82758621 0.86206897]

mean value: 0.8820505392981756

key: train_precision
value: [0.88235294 0.8968254  0.88235294 0.87401575 0.88671875 0.87692308
 0.8740458  0.88030888 0.88715953 0.87984496]

mean value: 0.8820548030282749

key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.89655172 0.89285714 0.85714286
 0.82142857 0.85714286 0.85714286 0.89285714]

mean value: 0.8899014778325123

key: train_recall
value: [0.88582677 0.88976378 0.88932806 0.87747036 0.89370079 0.8976378
 0.9015748  0.8976378  0.8976378  0.89370079]

mean value: 0.8924278733932962

key: test_roc_auc
value: [0.89593596 0.86083744 0.89408867 0.94827586 0.85714286 0.89285714
 0.89285714 0.875      0.83928571 0.875     ]

mean value: 0.883128078817734

key: train_roc_auc
value: [0.88362485 0.89349849 0.88560891 0.87574305 0.88976378 0.88582677
 0.88582677 0.88779528 0.89173228 0.88582677]

mean value: 0.8865246957766643

key: test_jcc
value: [0.81818182 0.76470588 0.81818182 0.89655172 0.75757576 0.8
 0.79310345 0.77419355 0.72727273 0.78125   ]

mean value: 0.7931016724365952

key: train_jcc
value: [0.79225352 0.80714286 0.795053   0.77894737 0.80212014 0.7972028
 0.79790941 0.8        0.80565371 0.79649123]

mean value: 0.7972774034752823

MCC on Blind test: 0.28

Accuracy on Blind test: 0.71

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01202488 0.01277637 0.0122776  0.01319098 0.013026   0.01311898
 0.01343989 0.01440072 0.01275349 0.01293349]

mean value: 0.01299424171447754

key: score_time
value: [0.00864363 0.00991464 0.0099988  0.01055336 0.01052094 0.01076031
 0.01056218 0.01061869 0.01053238 0.0105021 ]

mean value: 0.010260701179504395

key: test_mcc
value: [0.7589669  0.82942474 0.30469361 0.9321832  0.26997462 0.6882472
 0.26997462 0.76225171 0.82195294 0.85933785]

mean value: 0.649700739821353

key: train_mcc
value: [0.88439556 0.87825675 0.35307124 0.8935508  0.46259784 0.65176051
 0.33210739 0.86516672 0.88616336 0.86094079]

mean value: 0.7068010966004633

key: test_accuracy
value: [0.87719298 0.9122807  0.57894737 0.96491228 0.58928571 0.82142857
 0.58928571 0.875      0.91071429 0.92857143]

mean value: 0.8047619047619048

key: train_accuracy
value: [0.9408284  0.93885602 0.61143984 0.94674556 0.68110236 0.8011811
 0.6023622  0.93110236 0.94291339 0.92913386]

mean value: 0.8325665098075758

key: test_fscore
value: [0.88135593 0.91525424 0.29411765 0.96428571 0.7012987  0.84848485
 0.7012987  0.8852459  0.9122807  0.93103448]

mean value: 0.8034656868070665

key: train_fscore
value: [0.94318182 0.94003868 0.36245955 0.94632207 0.75675676 0.83305785
 0.71468927 0.93383743 0.94211577 0.93181818]

mean value: 0.8304277370347289

key: test_precision
value: [0.83870968 0.87096774 1.         1.         0.55102041 0.73684211
 0.55102041 0.81818182 0.89655172 0.9       ]

mean value: 0.8163293883264277

key: train_precision
value: [0.90875912 0.92395437 1.         0.952      0.61165049 0.71794872
 0.55726872 0.89818182 0.95546559 0.89781022]

mean value: 0.8423039046768191

key: test_recall
value: [0.92857143 0.96428571 0.17241379 0.93103448 0.96428571 1.
 0.96428571 0.96428571 0.92857143 0.96428571]

mean value: 0.8782019704433498

key: train_recall
value: [0.98031496 0.95669291 0.22134387 0.94071146 0.99212598 0.99212598
 0.99606299 0.97244094 0.92913386 0.96850394]

mean value: 0.8949456910771529

key: test_roc_auc
value: [0.87807882 0.91317734 0.5862069  0.96551724 0.58928571 0.82142857
 0.58928571 0.875      0.91071429 0.92857143]

mean value: 0.8057266009852218

key: train_roc_auc
value: [0.94075037 0.93882076 0.61067194 0.94673368 0.68110236 0.8011811
 0.6023622  0.93110236 0.94291339 0.92913386]

mean value: 0.832477202701441

key: test_jcc
value: [0.78787879 0.84375    0.17241379 0.93103448 0.54       0.73684211
 0.54       0.79411765 0.83870968 0.87096774]

mean value: 0.7055714235417677

key: train_jcc
value: [0.89247312 0.88686131 0.22134387 0.89811321 0.60869565 0.71388102
 0.55604396 0.87588652 0.89056604 0.87234043]

mean value: 0.7416205129351496

MCC on Blind test: 0.17

Accuracy on Blind test: 0.53

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01449275 0.01378703 0.01423025 0.01384568 0.01323628 0.01458144
 0.01555753 0.01341605 0.01425099 0.01479554]

mean value: 0.014219355583190919

key: score_time
value: [0.01101327 0.01102638 0.01097178 0.01104283 0.01102424 0.01096249
 0.01104045 0.01096034 0.01094913 0.01095819]

mean value: 0.010994911193847656

key: test_mcc
value: [0.85960591 0.92980296 0.8615634  0.9321832  0.76225171 0.96490128
 0.93094934 0.89342711 0.78772636 0.82618439]

mean value: 0.8748595658423624

key: train_mcc
value: [0.90933143 0.9215681  0.86053354 0.89231105 0.86150531 0.91030286
 0.90951226 0.87252327 0.91349911 0.86883933]

mean value: 0.8919926257747095

key: test_accuracy
value: [0.92982456 0.96491228 0.92982456 0.96491228 0.875      0.98214286
 0.96428571 0.94642857 0.89285714 0.91071429]

mean value: 0.9360902255639098

key: train_accuracy
value: [0.95463511 0.96055227 0.9270217  0.94477318 0.92913386 0.95472441
 0.95472441 0.93503937 0.95669291 0.93307087]

mean value: 0.9450368075292364

key: test_fscore
value: [0.92857143 0.96428571 0.93333333 0.96428571 0.8852459  0.98181818
 0.96296296 0.94736842 0.89655172 0.91525424]

mean value: 0.9379677619375378

key: train_fscore
value: [0.95499022 0.96       0.9310987  0.94238683 0.93207547 0.95372233
 0.95445545 0.9373814  0.95703125 0.93560606]

mean value: 0.9458747709029058

key: test_precision
value: [0.92857143 0.96428571 0.90322581 1.         0.81818182 1.
 1.         0.93103448 0.86666667 0.87096774]

mean value: 0.9282933658851346

key: train_precision
value: [0.94941634 0.97560976 0.88028169 0.98283262 0.89492754 0.97530864
 0.96015936 0.9047619  0.9496124  0.90145985]

mean value: 0.937437010931088

key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.96428571 0.96428571
 0.92857143 0.96428571 0.92857143 0.96428571]

mean value: 0.9503694581280788

key: train_recall
value: [0.96062992 0.94488189 0.98814229 0.90513834 0.97244094 0.93307087
 0.9488189  0.97244094 0.96456693 0.97244094]

mean value: 0.9562571970993744

key: test_roc_auc
value: [0.92980296 0.96490148 0.92918719 0.96551724 0.875      0.98214286
 0.96428571 0.94642857 0.89285714 0.91071429]

mean value: 0.9360837438423646

key: train_roc_auc
value: [0.95462326 0.96058324 0.92714201 0.94469515 0.92913386 0.95472441
 0.95472441 0.93503937 0.95669291 0.93307087]

mean value: 0.9450429491768074

key: test_jcc
value: [0.86666667 0.93103448 0.875      0.93103448 0.79411765 0.96428571
 0.92857143 0.9        0.8125     0.84375   ]

mean value: 0.8846960422099874

key: train_jcc
value: [0.91385768 0.92307692 0.87108014 0.89105058 0.87279152 0.91153846
 0.91287879 0.88214286 0.917603   0.87900356]

mean value: 0.8975023504978233

MCC on Blind test: 0.12

Accuracy on Blind test: 0.42

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.11413288 0.1020143  0.10187316 0.10205579 0.10220885 0.10228324
 0.10212898 0.10206866 0.10222936 0.10228419]

mean value: 0.10332794189453125

key: score_time
value: [0.01537633 0.01542163 0.01549554 0.01547527 0.01547194 0.01569366
 0.01554465 0.01544762 0.01546764 0.01553702]

mean value: 0.015493130683898926

key: test_mcc
value: [0.92980296 0.8951918  0.96547546 0.96551724 0.82618439 1.
 1.         0.92857143 0.92857143 0.89342711]

mean value: 0.9332741814992628

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.94736842 0.98245614 0.98245614 0.91071429 1.
 1.         0.96428571 0.96428571 0.94642857]

mean value: 0.9662907268170426

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.94545455 0.98305085 0.98245614 0.91525424 1.
 1.         0.96428571 0.96428571 0.94736842]

mean value: 0.966644133446096

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.96296296 0.96666667 1.         0.87096774 1.
 1.         0.96428571 0.96428571 0.93103448]

mean value: 0.9624488997180877

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.92857143 1.         0.96551724 0.96428571 1.
 1.         0.96428571 0.96428571 0.96428571]

mean value: 0.971551724137931

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96490148 0.94704433 0.98214286 0.98275862 0.91071429 1.
 1.         0.96428571 0.96428571 0.94642857]

mean value: 0.966256157635468

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.89655172 0.96666667 0.96551724 0.84375    1.
 1.         0.93103448 0.93103448 0.9       ]

mean value: 0.936558908045977

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.39

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03648138 0.03345275 0.03663468 0.03281045 0.03574181 0.03963566
 0.04988575 0.04006696 0.03826404 0.04667592]

mean value: 0.03896493911743164

key: score_time
value: [0.0171628  0.0221951  0.02012706 0.02416539 0.02962255 0.03259301
 0.0243876  0.07890892 0.02576041 0.01878405]

mean value: 0.029370689392089845

key: test_mcc
value: [0.92980296 0.8951918  0.8951918  1.         0.82195294 0.89802651
 0.96490128 0.92857143 0.92857143 0.89342711]

mean value: 0.915563726523076

key: train_mcc
value: [0.99606293 0.98425123 0.97636129 0.99606299 0.98819663 0.99607071
 0.99212598 0.98819663 0.98819663 0.99607071]

mean value: 0.9901595758977872

key: test_accuracy
value: [0.96491228 0.94736842 0.94736842 1.         0.91071429 0.94642857
 0.98214286 0.96428571 0.96428571 0.94642857]

mean value: 0.9573934837092731

key: train_accuracy
value: [0.99802761 0.99211045 0.98816568 0.99802761 0.99409449 0.9980315
 0.99606299 0.99409449 0.99409449 0.9980315 ]

mean value: 0.9950740809765644

key: test_fscore
value: [0.96428571 0.94545455 0.94915254 1.         0.9122807  0.94915254
 0.98181818 0.96428571 0.96428571 0.94736842]

mean value: 0.957808407768265

key: train_fscore
value: [0.99803536 0.99215686 0.98809524 0.99802761 0.99410609 0.99803536
 0.99606299 0.99408284 0.99410609 0.99802761]

mean value: 0.9950736067689547

key: test_precision
value: [0.96428571 0.96296296 0.93333333 1.         0.89655172 0.90322581
 1.         0.96428571 0.96428571 0.93103448]

mean value: 0.9519965452501604

key: train_precision
value: [0.99607843 0.98828125 0.99203187 0.99606299 0.99215686 0.99607843
 0.99606299 0.99604743 0.99215686 1.        ]

mean value: 0.9944957125827263

key: test_recall
value: [0.96428571 0.92857143 0.96551724 1.         0.92857143 1.
 0.96428571 0.96428571 0.96428571 0.96428571]

mean value: 0.9644088669950739

key: train_recall
value: [1.         0.99606299 0.98418972 1.         0.99606299 1.
 0.99606299 0.99212598 0.99606299 0.99606299]

mean value: 0.9956630668202048

key: test_roc_auc
value: [0.96490148 0.94704433 0.94704433 1.         0.91071429 0.94642857
 0.98214286 0.96428571 0.96428571 0.94642857]

mean value: 0.9573275862068966

key: train_roc_auc
value: [0.99802372 0.99210264 0.98815785 0.9980315  0.99409449 0.9980315
 0.99606299 0.99409449 0.99409449 0.9980315 ]

mean value: 0.9950725156391025

key: test_jcc
value: [0.93103448 0.89655172 0.90322581 1.         0.83870968 0.90322581
 0.96428571 0.93103448 0.93103448 0.9       ]

mean value: 0.9199102177022088

key: train_jcc
value: [0.99607843 0.9844358  0.97647059 0.99606299 0.98828125 0.99607843
 0.99215686 0.98823529 0.98828125 0.99606299]

mean value: 0.9902143889760475

MCC on Blind test: 0.09

Accuracy on Blind test: 0.38

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.16053271 0.1920464  0.18025184 0.14875555 0.09875679 0.1012013
 0.16610289 0.17586374 0.18154573 0.15119648]

mean value: 0.1556253433227539

key: score_time
value: [0.02007532 0.02150774 0.02181888 0.01334596 0.02305841 0.01333928
 0.02570271 0.02890587 0.01332211 0.02642632]

mean value: 0.020750260353088378

key: test_mcc
value: [0.76689254 0.79778885 0.75462449 0.8953202  0.71611487 0.78772636
 0.82618439 0.75047877 0.71611487 0.78772636]

mean value: 0.7798971716204695

key: train_mcc
value: [0.84667632 0.85019923 0.85012683 0.84728344 0.85850727 0.84698856
 0.8231473  0.84725158 0.8742597  0.86237183]

mean value: 0.8506812043667749

key: test_accuracy
value: [0.87719298 0.89473684 0.87719298 0.94736842 0.85714286 0.89285714
 0.91071429 0.875      0.85714286 0.89285714]

mean value: 0.8882205513784461

key: train_accuracy
value: [0.92307692 0.92504931 0.92504931 0.92307692 0.92913386 0.92322835
 0.91141732 0.92322835 0.93700787 0.93110236]

mean value: 0.9251370575719455

key: test_fscore
value: [0.8852459  0.9        0.88135593 0.94736842 0.86206897 0.88888889
 0.90566038 0.87272727 0.86206897 0.89655172]

mean value: 0.8901936449042431

key: train_fscore
value: [0.9245648  0.92578125 0.92519685 0.92485549 0.92996109 0.9245648
 0.91262136 0.92485549 0.93774319 0.93177388]

mean value: 0.9261918195384349

key: test_precision
value: [0.81818182 0.84375    0.86666667 0.96428571 0.83333333 0.92307692
 0.96       0.88888889 0.83333333 0.86666667]

mean value: 0.8798183344433345

key: train_precision
value: [0.90874525 0.91860465 0.92156863 0.90225564 0.91923077 0.90874525
 0.90038314 0.90566038 0.92692308 0.92277992]

mean value: 0.9134896700062805

key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.85714286
 0.85714286 0.85714286 0.89285714 0.92857143]

mean value: 0.9041871921182266

key: train_recall
value: [0.94094488 0.93307087 0.92885375 0.9486166  0.94094488 0.94094488
 0.92519685 0.94488189 0.9488189  0.94094488]

mean value: 0.9393218387227288

key: test_roc_auc
value: [0.87869458 0.89593596 0.87684729 0.9476601  0.85714286 0.89285714
 0.91071429 0.875      0.85714286 0.89285714]

mean value: 0.8884852216748769

key: train_roc_auc
value: [0.92304161 0.92503346 0.9250568  0.9231272  0.92913386 0.92322835
 0.91141732 0.92322835 0.93700787 0.93110236]

mean value: 0.9251377174691109

key: test_jcc
value: [0.79411765 0.81818182 0.78787879 0.9        0.75757576 0.8
 0.82758621 0.77419355 0.75757576 0.8125    ]

mean value: 0.8029609523554593

key: train_jcc
value: [0.85971223 0.86181818 0.86080586 0.86021505 0.86909091 0.85971223
 0.83928571 0.86021505 0.88278388 0.87226277]

mean value: 0.8625901890465713

MCC on Blind test: 0.29

Accuracy on Blind test: 0.72

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.27303624 0.25770164 0.24925447 0.25132322 0.24955368 0.25238132
 0.25456405 0.25274968 0.25270486 0.25955772]

mean value: 0.25528268814086913

key: score_time
value: [0.00924158 0.0084126  0.00870824 0.00866604 0.0086503  0.0085566
 0.00878334 0.00932741 0.00889683 0.00852871]

mean value: 0.008777165412902832

key: test_mcc
value: [0.92980296 0.92980296 0.8951918  1.         0.82195294 0.93094934
 1.         0.89342711 0.96490128 0.92857143]

mean value: 0.9294599815844486

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96491228 0.96491228 0.94736842 1.         0.91071429 0.96428571
 1.         0.94642857 0.98214286 0.96428571]

mean value: 0.9645050125313284

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96428571 0.96428571 0.94915254 1.         0.9122807  0.96551724
 1.         0.94545455 0.98181818 0.96428571]

mean value: 0.9647080355636448

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96428571 0.96428571 0.93333333 1.         0.89655172 0.93333333
 1.         0.96296296 1.         0.96428571]

mean value: 0.9619038496624703

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96428571 0.96428571 0.96551724 1.         0.92857143 1.
 1.         0.92857143 0.96428571 0.96428571]

mean value: 0.9679802955665024

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96490148 0.96490148 0.94704433 1.         0.91071429 0.96428571
 1.         0.94642857 0.98214286 0.96428571]

mean value: 0.9644704433497537

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93103448 0.93103448 0.90322581 1.         0.83870968 0.93333333
 1.         0.89655172 0.96428571 0.93103448]

mean value: 0.9329209703903808

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.3

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01269507 0.0144012  0.01376104 0.01396441 0.01427913 0.01633024
 0.01445484 0.01412106 0.01710892 0.01414371]

mean value: 0.014525961875915528

key: score_time
value: [0.01109529 0.01073503 0.01095819 0.01101351 0.01113129 0.01230645
 0.01111698 0.01105785 0.01199865 0.01175356]

mean value: 0.011316680908203125

key: test_mcc
value: [0.5149026  0.65634573 0.65634573 0.76689254 0.67900461 0.57735027
 0.83484711 0.64285714 0.56573571 0.71611487]

mean value: 0.6610396314952282

key: train_mcc
value: [0.76157807 0.80278863 0.76582615 0.80208917 0.81501748 0.8019582
 0.81112421 0.82360735 0.73708689 0.76803489]

mean value: 0.78891110313645

key: test_accuracy
value: [0.75438596 0.8245614  0.8245614  0.87719298 0.83928571 0.78571429
 0.91071429 0.82142857 0.76785714 0.85714286]

mean value: 0.8262844611528822

key: train_accuracy
value: [0.87573964 0.90138067 0.87968442 0.89940828 0.90748031 0.8996063
 0.90551181 0.91141732 0.86417323 0.87992126]

mean value: 0.8924323253971952

key: test_fscore
value: [0.76666667 0.83333333 0.81481481 0.86792453 0.84210526 0.76923077
 0.90196078 0.82142857 0.8        0.85185185]

mean value: 0.8269316583099514

key: train_fscore
value: [0.86509636 0.90118577 0.87103594 0.89440994 0.90693069 0.89527721
 0.9047619  0.90945674 0.8738574  0.87048832]

mean value: 0.8892500281591235

key: test_precision
value: [0.71875    0.78125    0.88       0.95833333 0.82758621 0.83333333
 1.         0.82142857 0.7027027  0.88461538]

mean value: 0.8407999532309878

key: train_precision
value: [0.94835681 0.9047619  0.93636364 0.93913043 0.9123506  0.93562232
 0.912      0.93004115 0.81569966 0.94470046]

mean value: 0.9179026970421955

key: test_recall
value: [0.82142857 0.89285714 0.75862069 0.79310345 0.85714286 0.71428571
 0.82142857 0.82142857 0.92857143 0.82142857]

mean value: 0.8230295566502464

key: train_recall
value: [0.79527559 0.8976378  0.81422925 0.85375494 0.9015748  0.85826772
 0.8976378  0.88976378 0.94094488 0.80708661]

mean value: 0.8656173166101273

key: test_roc_auc
value: [0.75554187 0.82573892 0.82573892 0.87869458 0.83928571 0.78571429
 0.91071429 0.82142857 0.76785714 0.85714286]

mean value: 0.8267857142857143

key: train_roc_auc
value: [0.87589866 0.90138807 0.87955557 0.89931842 0.90748031 0.8996063
 0.90551181 0.91141732 0.86417323 0.87992126]

mean value: 0.892427095328499

key: test_jcc
value: [0.62162162 0.71428571 0.6875     0.76666667 0.72727273 0.625
 0.82142857 0.6969697  0.66666667 0.74193548]

mean value: 0.7069347148782633

key: train_jcc
value: [0.76226415 0.82014388 0.77153558 0.80898876 0.82971014 0.81040892
 0.82608696 0.83394834 0.77597403 0.77067669]

mean value: 0.8009737460973876

MCC on Blind test: 0.31

Accuracy on Blind test: 0.68

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01170778 0.01146626 0.03049469 0.03104663 0.02517843 0.0244348
 0.03004813 0.03791165 0.03372884 0.02444196]

mean value: 0.026045918464660645

key: score_time
value: [0.01076579 0.01079035 0.01894236 0.01367116 0.02186847 0.02198577
 0.01771045 0.02187347 0.02229476 0.02252173]

mean value: 0.018242430686950684

key: test_mcc
value: [0.82942474 0.76689254 0.79110556 0.9321832  0.71611487 0.82195294
 0.85933785 0.78772636 0.67900461 0.75047877]

mean value: 0.7934221443680324

key: train_mcc
value: [0.8266528  0.81876065 0.82265144 0.82358593 0.83910959 0.81142619
 0.80377277 0.81930411 0.83123063 0.80759374]

mean value: 0.8204087868380765

key: test_accuracy
value: [0.9122807  0.87719298 0.89473684 0.96491228 0.85714286 0.91071429
 0.92857143 0.89285714 0.83928571 0.875     ]

mean value: 0.8952694235588973

key: train_accuracy
value: [0.91321499 0.90927022 0.9112426  0.9112426  0.91929134 0.90551181
 0.9015748  0.90944882 0.91535433 0.90354331]

mean value: 0.9099694823650002

key: test_fscore
value: [0.91525424 0.8852459  0.9        0.96428571 0.86206897 0.90909091
 0.92592593 0.88888889 0.84210526 0.87719298]

mean value: 0.8970058788250195

key: train_fscore
value: [0.91439689 0.91050584 0.91193738 0.9132948  0.92069632 0.90697674
 0.9034749  0.91085271 0.91682785 0.90522244]

mean value: 0.9114185875040357

key: test_precision
value: [0.87096774 0.81818182 0.87096774 1.         0.83333333 0.92592593
 0.96153846 0.92307692 0.82758621 0.86206897]

mean value: 0.8893647118341224

key: train_precision
value: [0.90384615 0.9        0.90310078 0.89097744 0.90494297 0.89312977
 0.88636364 0.89694656 0.90114068 0.88973384]

mean value: 0.8970181835384771

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
 0.89285714 0.85714286 0.85714286 0.89285714]

mean value: 0.9076354679802956

key: train_recall
value: [0.92519685 0.92125984 0.92094862 0.93675889 0.93700787 0.92125984
 0.92125984 0.92519685 0.93307087 0.92125984]

mean value: 0.9263219320905045

key: test_roc_auc
value: [0.91317734 0.87869458 0.89408867 0.96551724 0.85714286 0.91071429
 0.92857143 0.89285714 0.83928571 0.875     ]

mean value: 0.8955049261083744

key: train_roc_auc
value: [0.91319131 0.90924652 0.91126171 0.91129283 0.91929134 0.90551181
 0.9015748  0.90944882 0.91535433 0.90354331]

mean value: 0.9099716784413806

key: test_jcc
value: [0.84375    0.79411765 0.81818182 0.93103448 0.75757576 0.83333333
 0.86206897 0.8        0.72727273 0.78125   ]

mean value: 0.8148584731698322

key: train_jcc
value: [0.84229391 0.83571429 0.8381295  0.84042553 0.85304659 0.82978723
 0.82394366 0.83629893 0.84642857 0.82685512]

mean value: 0.837292333932638

MCC on Blind test: 0.25

Accuracy on Blind test: 0.71

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
       'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
       'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
       'provean_score', 'maf', 'logorI', 'lineage_proportion',
       'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.18650627 0.19624949 0.19074392 0.19366646 0.20337486 0.19410229
 0.19437337 0.18854046 0.24514914 0.20517421]

mean value: 0.19978804588317872

key: score_time
value: [0.01944613 0.01078582 0.01317906 0.01077127 0.01899004 0.02023435
 0.02145123 0.01528096 0.01077437 0.02091956]

mean value: 0.01618328094482422

key: test_mcc
value: [0.82942474 0.76689254 0.79110556 0.9321832  0.75434227 0.82195294
 0.85933785 0.85933785 0.67900461 0.78571429]

mean value: 0.8079295836885118

key: train_mcc
value: [0.8266528  0.86654135 0.82265144 0.85931426 0.86681377 0.85105352
 0.80377277 0.85922715 0.86681377 0.85105352]

mean value: 0.847389438313628

key: test_accuracy
value: [0.9122807  0.87719298 0.89473684 0.96491228 0.875      0.91071429
 0.92857143 0.92857143 0.83928571 0.89285714]

mean value: 0.9024122807017544

key: train_accuracy
value: [0.91321499 0.93293886 0.9112426  0.92899408 0.93307087 0.92519685
 0.9015748  0.92913386 0.93307087 0.92519685]

mean value: 0.9233634627032568

key: test_fscore
value: [0.91525424 0.8852459  0.9        0.96428571 0.88135593 0.90909091
 0.92592593 0.92592593 0.84210526 0.89285714]

mean value: 0.9042046952374383

key: train_fscore
value: [0.91439689 0.93436293 0.91193738 0.93076923 0.93436293 0.92664093
 0.9034749  0.93076923 0.93436293 0.92664093]

mean value: 0.9247718286234357

key: test_precision
value: [0.87096774 0.81818182 0.87096774 1.         0.83870968 0.92592593
 0.96153846 0.96153846 0.82758621 0.89285714]

mean value: 0.8968273178228685

key: train_precision
value: [0.90384615 0.91666667 0.90310078 0.90636704 0.91666667 0.90909091
 0.88636364 0.90977444 0.91666667 0.90909091]

mean value: 0.9077633860874135

key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
 0.89285714 0.89285714 0.85714286 0.89285714]

mean value: 0.9147783251231527

key: train_recall
value: [0.92519685 0.95275591 0.92094862 0.95652174 0.95275591 0.94488189
 0.92125984 0.95275591 0.95275591 0.94488189]

mean value: 0.9424714450219415

key: test_roc_auc
value: [0.91317734 0.87869458 0.89408867 0.96551724 0.875      0.91071429
 0.92857143 0.92857143 0.83928571 0.89285714]

mean value: 0.9026477832512315

key: train_roc_auc
value: [0.91319131 0.93289969 0.91126171 0.92904827 0.93307087 0.92519685
 0.9015748  0.92913386 0.93307087 0.92519685]

mean value: 0.9233645077962094

key: test_jcc
value: [0.84375    0.79411765 0.81818182 0.93103448 0.78787879 0.83333333
 0.86206897 0.86206897 0.72727273 0.80645161]

mean value: 0.826615834042182

key: train_jcc
value: [0.84229391 0.87681159 0.8381295  0.8705036  0.87681159 0.86330935
 0.82394366 0.8705036  0.87681159 0.86330935]

mean value: 0.8602427747074015

MCC on Blind test: 0.25

Accuracy on Blind test: 0.7