LSHTM_analysis/scripts/ml/log_katg_config.txt

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
1.22.4
1.4.1

aaindex_df contains non-numerical data

Total no. of non-numerial columns: 2

Selecting numerical data only

PASS: successfully selected numerical columns only for aaindex_df

Now checking for NA in the remaining aaindex_cols

Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127

Revised df ncols: 123

Checking NA in revised df...

PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df

PASS: ncols match
Expected ncols: 123
Got: 123

Total no. of columns in clean aa_df: 123

Proceeding to merge, expected nrows in merged_df: 817

PASS: my_features_df and aa_df successfully combined
nrows: 817
ncols: 269
count of NULL values before imputation

or_mychisq          244
log10_or_mychisq    244
dtype: int64
count of NULL values AFTER imputation

mutationinformation    0
or_rawI                0
logorI                 0
dtype: int64

PASS: OR values imputed, data ready for ML

No. of numerical features: 45
No. of categorical features: 7

index: 0
ind: 1

Mask count check: True

index: 1
ind: 2

Mask count check: True
Original Data
 Counter({1: 309, 0: 158}) Data dim: (467, 52)

-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (467, 52)
Test data size: (350, 52)
y_train numbers: Counter({1: 309, 0: 158})
y_train ratio: 0.511326860841424

y_test_numbers: Counter({0: 315, 1: 35})
y_test ratio: 9.0
-------------------------------------------------------------
Simple Random OverSampling
 Counter({1: 309, 0: 309})
(618, 52)
Simple Random UnderSampling
 Counter({0: 158, 1: 158})
(316, 52)
Simple Combined Over and UnderSampling
 Counter({0: 309, 1: 309})
(618, 52)
SMOTE_NC OverSampling
 Counter({1: 309, 0: 309})
(618, 52)

#####################################################################

Running ML analysis: UQ [without AA  index but with active site annotations]
Gene name: katG
Drug name: isoniazid

Output directory: /home/tanu/git/Data/isoniazid/output/ml/uq_v1/

Sanity checks:
Total input features: 52

Training data size: (467, 52)
Test data size: (350, 52)

Target feature numbers (training data): Counter({1: 309, 0: 158})
Target features ratio (training data: 0.511326860841424

Target feature numbers (test data): Counter({0: 315, 1: 35})
Target features ratio (test data): 9.0

#####################################################################


================================================================

Strucutral features (n): 36
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================

Evolutionary features (n): 3
These are:
 ['consurf_score', 'snap2_score', 'provean_score']
================================================================

Genomic features (n): 6
These are:
 ['maf', 'logorI']
 ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================

Categorical features (n): 7
These are:
 ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================


Pass: No. of features match

#####################################################################


Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02167606 0.02372026 0.03166604 0.02357769 0.02548194 0.02195692
 0.02136278 0.02161574 0.02221417 0.02264333]

mean value: 0.023591494560241698

key: score_time
value: [0.0109992  0.01075363 0.01093793 0.01066351 0.01062679 0.01058674
 0.01058102 0.01062608 0.0105927  0.01066446]

mean value: 0.010703206062316895

key: test_mcc
value: [0.90662544 0.66402366 0.60908698 0.90662544 0.86070252 0.66337469
 0.67402153 0.80215054 0.66040066 0.85943956]

mean value: 0.7606451028769974

key: train_mcc
value: [0.83338837 0.82273265 0.789683   0.77877628 0.76217448 0.80630977
 0.79579908 0.77434754 0.7963019  0.80086095]

mean value: 0.7960374023577294

key: test_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[0.95744681 0.85106383 0.82978723 0.95744681 0.93617021 0.85106383
 0.85106383 0.91304348 0.84782609 0.93478261]

mean value: 0.8929694727104533

key: train_accuracy
value: [0.92619048 0.92142857 0.90714286 0.90238095 0.8952381  0.91428571
 0.90952381 0.90023753 0.90973872 0.91211401]

mean value: 0.9098280737473137

key: test_fscore
value: [0.96875    0.88888889 0.87878788 0.96875    0.95384615 0.89552239
 0.8852459  0.93548387 0.8852459  0.95238095]

mean value: 0.9212901936210006

key: train_fscore
value: [0.94532628 0.94240838 0.93169877 0.92869565 0.92334495 0.93728223
 0.93425606 0.92682927 0.93379791 0.93542757]

mean value: 0.9339067066812484

key: test_precision
value: [0.93939394 0.875      0.82857143 0.93939394 0.91176471 0.83333333
 0.9        0.93548387 0.9        0.90909091]

mean value: 0.8972032126633644

key: train_precision
value: [0.92733564 0.91525424 0.90784983 0.8989899  0.89527027 0.90878378
 0.9        0.89864865 0.90540541 0.91156463]

mean value: 0.9069102339726427

key: test_recall
value: [1.         0.90322581 0.93548387 1.         1.         0.96774194
 0.87096774 0.93548387 0.87096774 1.        ]

mean value: 0.9483870967741935

key: train_recall
value: [0.96402878 0.97122302 0.95683453 0.96043165 0.95323741 0.9676259
 0.97122302 0.95683453 0.96402878 0.96057348]

mean value: 0.962604110260179

key: test_roc_auc
value: [0.9375     0.8266129  0.78024194 0.9375     0.90625    0.79637097
 0.84173387 0.90107527 0.83548387 0.90625   ]

mean value: 0.8669018817204301

key: train_roc_auc
value: [0.90807073 0.89758334 0.88334684 0.87458202 0.86746378 0.88874253
 0.87997771 0.87352216 0.88411229 0.88873744]

mean value: 0.8846138841461438

key: test_jcc
value: [0.93939394 0.8        0.78378378 0.93939394 0.91176471 0.81081081
 0.79411765 0.87878788 0.79411765 0.90909091]

mean value: 0.8561261261261262

key: train_jcc
value: [0.89632107 0.89108911 0.87213115 0.86688312 0.85760518 0.88196721
 0.87662338 0.86363636 0.87581699 0.87868852]

mean value: 0.8760762092991343

MCC on Blind test: 0.23

Accuracy on Blind test: 0.45

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.74151611 1.08886981 0.69769788 0.70535755 0.87346554 0.72519445
 0.73284912 0.83741045 0.65530038 0.68675303]

mean value: 0.7744414329528808

key: score_time
value: [0.01378059 0.01389503 0.01416969 0.01405454 0.01443934 0.0140748
 0.01437092 0.01120043 0.0144248  0.01425123]

mean value: 0.013866138458251954

key: test_mcc
value: [1.         0.8566725  1.         0.95299692 0.90662544 0.76032282
 0.90524194 0.9085301  0.85513419 0.85513419]

mean value: 0.90006580934109

key: train_mcc
value: [0.93593571 0.96269263 0.94130059 0.93593571 0.95736701 0.95734993
 0.94131391 0.9469026  0.95756757 0.95740101]

mean value: 0.9493766673456756

key: test_accuracy
value: [1.         0.93617021 1.         0.9787234  0.95744681 0.89361702
 0.95744681 0.95652174 0.93478261 0.93478261]

mean value: 0.9549491211840888

key: train_accuracy
value: [0.97142857 0.98333333 0.97380952 0.97142857 0.98095238 0.98095238
 0.97380952 0.97624703 0.98099762 0.98099762]

mean value: 0.9773956565999321

key: test_fscore
value: [1.         0.95238095 1.         0.98412698 0.96875    0.92307692
 0.96774194 0.96666667 0.95081967 0.95081967]

mean value: 0.9664382805997692

key: train_fscore
value: [0.97857143 0.98747764 0.980322   0.97857143 0.98571429 0.98566308
 0.98039216 0.98214286 0.98571429 0.98571429]

mean value: 0.983028345294684

key: test_precision
value: [1.         0.9375     1.         0.96875    0.93939394 0.88235294
 0.96774194 1.         0.96666667 0.93548387]

mean value: 0.959788935368869

key: train_precision
value: [0.97163121 0.98220641 0.97508897 0.97163121 0.9787234  0.98214286
 0.97173145 0.9751773  0.9787234  0.98220641]

mean value: 0.9769262610088233

key: test_recall
value: [1.         0.96774194 1.         1.         1.         0.96774194
 0.96774194 0.93548387 0.93548387 0.96666667]

mean value: 0.9740860215053764

key: train_recall
value: [0.98561151 0.99280576 0.98561151 0.98561151 0.99280576 0.98920863
 0.98920863 0.98920863 0.99280576 0.98924731]

mean value: 0.9892125009669683

key: test_roc_auc
value: [1.         0.92137097 1.         0.96875    0.9375     0.85887097
 0.95262097 0.96774194 0.9344086  0.92083333]

mean value: 0.9462096774193549

key: train_roc_auc
value: [0.96463674 0.97879724 0.96815787 0.96463674 0.97527612 0.97699868
 0.9664353  0.97012879 0.97542386 0.97701802]

mean value: 0.9717509367831001

key: test_jcc
value: [1.         0.90909091 1.         0.96875    0.93939394 0.85714286
 0.9375     0.93548387 0.90625    0.90625   ]

mean value: 0.9359861576595447

key: train_jcc
value: [0.95804196 0.97526502 0.96140351 0.95804196 0.97183099 0.97173145
 0.96153846 0.96491228 0.97183099 0.97183099]

mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
0.9666427591273636

MCC on Blind test: 0.14

Accuracy on Blind test: 0.32

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01048064 0.00996375 0.00781631 0.00742173 0.00739574 0.00739932
 0.00733399 0.00878787 0.0088315  0.00836849]

mean value: 0.008379936218261719

key: score_time
value: [0.01066589 0.00898504 0.0085032  0.0080471  0.00794482 0.0084784
 0.00797868 0.00964499 0.00877905 0.00852466]

mean value: 0.008755183219909668

key: test_mcc
value: [0.8566725  0.50614703 0.62096774 0.76032282 0.81048387 0.71572581
 0.59764284 0.75776742 0.60430108 0.36514837]

mean value: 0.6595179479313003

key: train_mcc
value: [0.70671585 0.70811111 0.71695894 0.68716403 0.71727396 0.73126698
 0.71138479 0.71852622 0.74194944 0.54109586]

mean value: 0.6980447184919443

key: test_accuracy
value: [0.93617021 0.76595745 0.82978723 0.89361702 0.91489362 0.87234043
 0.80851064 0.89130435 0.82608696 0.67391304]

mean value: 0.8412580943570768

key: train_accuracy
value: [0.87142857 0.86666667 0.86904762 0.85714286 0.87142857 0.87857143
 0.86904762 0.87173397 0.88361045 0.74821853]

mean value: 0.8586896278701505

key: test_fscore
value: [0.95238095 0.81355932 0.87096774 0.92307692 0.93548387 0.90322581
 0.84745763 0.91803279 0.87096774 0.71698113]

mean value: 0.8752133904861458

key: train_fscore
value: [0.90721649 0.8974359  0.89833641 0.89010989 0.90145985 0.90744102
 0.89981785 0.90145985 0.91139241 0.77916667]

mean value: 0.8893836343169823

key: test_precision
value: [0.9375     0.85714286 0.87096774 0.88235294 0.93548387 0.90322581
 0.89285714 0.93333333 0.87096774 0.82608696]

mean value: 0.8909918392321865

key: train_precision
value: [0.86842105 0.9141791  0.92395437 0.90671642 0.91481481 0.91575092
 0.91143911 0.91481481 0.91636364 0.93034826]

mean value: 0.9116802502485006

key: test_recall
value: [0.96774194 0.77419355 0.87096774 0.96774194 0.93548387 0.90322581
 0.80645161 0.90322581 0.87096774 0.63333333]

mean value: 0.8633333333333333

key: train_recall
value: [0.94964029 0.88129496 0.87410072 0.87410072 0.88848921 0.89928058
 0.88848921 0.88848921 0.90647482 0.6702509 ]

mean value: 0.8720610608287563

key: test_roc_auc
value: [0.92137097 0.76209677 0.81048387 0.85887097 0.90524194 0.8578629
 0.80947581 0.88494624 0.80215054 0.69166667]

mean value: 0.8304166666666667

key: train_roc_auc
value: [0.83397507 0.85966157 0.86662782 0.84902219 0.86325869 0.86865437
 0.85973756 0.86382502 0.87281783 0.78582967]

mean value: 0.8523409805276452

key: test_jcc
value: [0.90909091 0.68571429 0.77142857 0.85714286 0.87878788 0.82352941
 0.73529412 0.84848485 0.77142857 0.55882353]

mean value: 0.7839724980901451

key: train_jcc
value: [0.83018868 0.81395349 0.81543624 0.8019802  0.82059801 0.83056478
 0.81788079 0.82059801 0.8372093  0.63822526]

mean value: 0.8026634757590374

MCC on Blind test: 0.22

Accuracy on Blind test: 0.56

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00818086 0.0079577  0.00792146 0.00760269 0.00763583 0.00755167
 0.00756431 0.00766754 0.00792885 0.00768661]

mean value: 0.00776975154876709

key: score_time
value: [0.00826931 0.00858402 0.0080297  0.00802231 0.00803876 0.00794721
 0.00808096 0.00816536 0.00821137 0.0079844 ]

mean value: 0.008133339881896972

key: test_mcc
value: [0.76746995 0.61207663 0.31752781 0.71206211 0.76032282 0.6139232
 0.66402366 0.59332241 0.38733878 0.70954337]

mean value: 0.6137610732708011

key: train_mcc
value: [0.62791789 0.64521328 0.66619129 0.63945586 0.63982246 0.63982246
 0.6506538  0.65794031 0.65846852 0.63442864]

mean value: 0.6459914516114823

key: test_accuracy
value: [0.89361702 0.82978723 0.70212766 0.87234043 0.89361702 0.82978723
 0.85106383 0.82608696 0.73913043 0.86956522]

mean value: 0.8307123034227567

key: train_accuracy
value: [0.83809524 0.8452381  0.85238095 0.84285714 0.84285714 0.84285714
 0.84761905 0.85035629 0.85035629 0.84085511]

mean value: 0.8453472457866757

key: test_fscore
value: [0.91803279 0.875      0.78125    0.90909091 0.92307692 0.88235294
 0.88888889 0.875      0.8125     0.90625   ]

mean value: 0.8771442449118437

key: train_fscore
value: [0.88316151 0.88773748 0.89007092 0.8862069  0.88581315 0.88581315
 0.88965517 0.89156627 0.89081456 0.88468158]

mean value: 0.8875520685563664

key: test_precision
value: [0.93333333 0.84848485 0.75757576 0.85714286 0.88235294 0.81081081
 0.875      0.84848485 0.78787879 0.85294118]

mean value: 0.8454005361358302

key: train_precision
value: [0.84539474 0.8538206  0.87762238 0.85099338 0.85333333 0.85333333
 0.85430464 0.85478548 0.85953177 0.85099338]

mean value: 0.8554113020989377

key: test_recall
value: [0.90322581 0.90322581 0.80645161 0.96774194 0.96774194 0.96774194
 0.90322581 0.90322581 0.83870968 0.96666667]

mean value: 0.9127956989247312

key: train_recall
value: [0.92446043 0.92446043 0.9028777  0.92446043 0.92086331 0.92086331
 0.92805755 0.93165468 0.92446043 0.92114695]

mean value: 0.9223305226786314

key: test_roc_auc
value: [0.8891129  0.7953629  0.65322581 0.82762097 0.85887097 0.76512097
 0.8266129  0.78494624 0.68602151 0.82708333]

mean value: 0.7913978494623656

key: train_roc_auc
value: [0.79673726 0.80730064 0.82819941 0.80377951 0.80550208 0.80550208
 0.8090992  0.81198118 0.81537707 0.80212277]

mean value: 0.8085601200017799

key: test_jcc
value: [0.84848485 0.77777778 0.64102564 0.83333333 0.85714286 0.78947368
 0.8        0.77777778 0.68421053 0.82857143]

mean value: 0.783779787463998

key: train_jcc
value: [0.79076923 0.79813665 0.80191693 0.79566563 0.79503106 0.79503106
 0.80124224 0.80434783 0.803125   0.79320988]

mean value: 0.7978475494770487

MCC on Blind test: 0.24

Accuracy on Blind test: 0.47

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00735164 0.00838447 0.00824022 0.00810122 0.00807238 0.0080514
 0.00810838 0.00790358 0.00764203 0.00775075]

mean value: 0.00796060562133789

key: score_time
value: [0.09431863 0.01160264 0.01148391 0.01503587 0.01451468 0.0130167
 0.01415229 0.01103735 0.01092792 0.01100278]

mean value: 0.02070927619934082

key: test_mcc
value: [0.76746995 0.76034808 0.4031367  0.65994312 0.71025956 0.61207663
 0.56769924 0.58251534 0.49033059 0.48102958]

mean value: 0.6034808785180602

key: train_mcc
value: [0.69858559 0.69632669 0.75172804 0.69676775 0.73520628 0.71297421
 0.70164234 0.70915156 0.73690278 0.72050578]

mean value: 0.7159791011797761

key: test_accuracy
value: [0.89361702 0.89361702 0.74468085 0.85106383 0.87234043 0.82978723
 0.80851064 0.80434783 0.7826087  0.76086957]

mean value: 0.8241443108233117

key: train_accuracy
value: [0.86666667 0.86666667 0.89047619 0.86666667 0.88333333 0.87380952
 0.86904762 0.87173397 0.88361045 0.87648456]

mean value: 0.8748495645288994

key: test_fscore
value: [0.91803279 0.92063492 0.81818182 0.89230769 0.90625    0.875
 0.85714286 0.84745763 0.84375    0.81355932]

mean value: 0.8692317024305076

key: train_fscore
value: [0.90070922 0.9020979  0.91901408 0.90175439 0.91388401 0.90718039
 0.90401396 0.90526316 0.91358025 0.90812721]

mean value: 0.9075624559641324

key: test_precision
value: [0.93333333 0.90625    0.77142857 0.85294118 0.87878788 0.84848485
 0.84375    0.89285714 0.81818182 0.82758621]

mean value: 0.8573600976440733

key: train_precision
value: [0.88811189 0.87755102 0.9        0.88013699 0.89347079 0.88395904
 0.8779661  0.88356164 0.89619377 0.89547038]

mean value: 0.887642163000012

key: test_recall
value: [0.90322581 0.93548387 0.87096774 0.93548387 0.93548387 0.90322581
 0.87096774 0.80645161 0.87096774 0.8       ]

mean value: 0.8832258064516129

key: train_recall
value: [0.91366906 0.92805755 0.93884892 0.92446043 0.9352518  0.93165468
 0.93165468 0.92805755 0.93165468 0.92114695]

mean value: 0.9284456305923003

key: test_roc_auc
value: [0.8891129  0.87399194 0.68548387 0.81149194 0.84274194 0.7953629
 0.77923387 0.80322581 0.73548387 0.74375   ]

mean value: 0.7959879032258065

key: train_roc_auc
value: [0.84415848 0.83726821 0.86731178 0.83899078 0.85847097 0.84610903
 0.83906677 0.84514766 0.86093223 0.85493967]

mean value: 0.8492395591157109

key: test_jcc
value: [0.84848485 0.85294118 0.69230769 0.80555556 0.82857143 0.77777778
 0.75       0.73529412 0.72972973 0.68571429]

mean value: 0.7706376612258965

key: train_jcc
value: [0.81935484 0.82165605 0.85016287 0.82108626 0.84142395 0.83012821
 0.82484076 0.82692308 0.84090909 0.83171521]

mean value: 0.8308200313963069

MCC on Blind test: 0.2

Accuracy on Blind test: 0.45

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01428485 0.0115571  0.0116024  0.01192141 0.0118351  0.01425028
 0.01206684 0.01400876 0.01199913 0.01239347]

mean value: 0.012591934204101563

key: score_time
value: [0.00854349 0.00847244 0.00854373 0.00855494 0.00946355 0.00848031
 0.00859547 0.00851321 0.0085175  0.00893998]

mean value: 0.00866246223449707

key: test_mcc
value: [0.8566725  0.71206211 0.50611184 0.76032282 0.66337469 0.6139232
 0.65994312 0.64852426 0.38733878 0.72168784]

mean value: 0.6529961162737778

key: train_mcc
value: [0.69022744 0.66164278 0.68466145 0.65612626 0.66739922 0.67302425
 0.67350891 0.66972224 0.68052658 0.67334868]

mean value: 0.6730187805126882

key: test_accuracy
value: [0.93617021 0.87234043 0.78723404 0.89361702 0.85106383 0.82978723
 0.85106383 0.84782609 0.73913043 0.86956522]

mean value: 0.8477798334875115

key: train_accuracy
value: [0.86428571 0.85238095 0.86190476 0.85       0.8547619  0.85714286
 0.85714286 0.85510689 0.85985748 0.85748219]

mean value: 0.8570065603438525

key: test_fscore
value: [0.95238095 0.90909091 0.84848485 0.92307692 0.89552239 0.88235294
 0.89230769 0.88888889 0.8125     0.90909091]

mean value: 0.8913696452557295

key: train_fscore
value: [0.90289608 0.89419795 0.90136054 0.89303905 0.89608177 0.89761092
 0.89830508 0.89678511 0.89948893 0.89795918]

mean value: 0.8977724625814629

key: test_precision
value: [0.9375     0.85714286 0.8        0.88235294 0.83333333 0.81081081
 0.85294118 0.875      0.78787879 0.83333333]

mean value: 0.8470293240146182

key: train_precision
value: [0.85760518 0.85064935 0.85483871 0.84565916 0.85113269 0.8538961
 0.84935897 0.84664537 0.85436893 0.85436893]

mean value: 0.8518523398136467

key: test_recall
value: [0.96774194 0.96774194 0.90322581 0.96774194 0.96774194 0.96774194
 0.93548387 0.90322581 0.83870968 1.        ]

mean value: 0.9419354838709677

key: train_recall
value: [0.95323741 0.94244604 0.95323741 0.94604317 0.94604317 0.94604317
 0.95323741 0.95323741 0.94964029 0.94623656]

mean value: 0.9489402026765684

key: test_roc_auc
value: [0.92137097 0.82762097 0.7328629  0.85887097 0.79637097 0.76512097
 0.81149194 0.81827957 0.68602151 0.8125    ]

mean value: 0.8030510752688172

key: train_roc_auc
value: [0.82168913 0.80925119 0.818168   0.8040075  0.81104975 0.81457088
 0.81112575 0.80878654 0.81747749 0.81466758]

mean value: 0.813079379384182

key: test_jcc
value: [0.90909091 0.83333333 0.73684211 0.85714286 0.81081081 0.78947368
 0.80555556 0.8        0.68421053 0.83333333]

mean value: 0.8059793115056273

key: train_jcc
value: [0.82298137 0.80864198 0.82043344 0.80674847 0.8117284  0.81424149
 0.81538462 0.81288344 0.81733746 0.81481481]

mean value: 0.8145195452770848

MCC on Blind test: 0.25

Accuracy on Blind test: 0.45

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.31461072 1.40823627 1.28151274 1.4231658  1.334095   1.30517697
 1.41587329 1.28833318 1.49593544 1.34908724]

mean value: 1.3616026639938354

key: score_time
value: [0.01176286 0.01351857 0.0135088  0.01388788 0.01229548 0.01362157
 0.01102948 0.01351404 0.01373792 0.01853848]

mean value: 0.013541507720947265

key: test_mcc
value: [1.         0.8084425  0.90662544 1.         0.95299692 0.76032282
 0.90524194 0.90107527 0.74930844 0.80833333]

mean value: 0.8792346661083966

key: train_mcc
value: [0.9680267  0.95736701 0.94674008 0.9680267  0.96269263 0.9680267
 0.9628398  0.96296053 0.95222181 0.99470992]

mean value: 0.9643611879690016

key: test_accuracy
value: [1.         0.91489362 0.95744681 1.         0.9787234  0.89361702
 0.95744681 0.95652174 0.89130435 0.91304348]

mean value: 0.9462997224791859

key: train_accuracy
value: [0.98571429 0.98095238 0.97619048 0.98571429 0.98333333 0.98571429
 0.98333333 0.98337292 0.97862233 0.9976247 ]

mean value: 0.9840572333446442

key: test_fscore
value: [1.         0.9375     0.96875    1.         0.98412698 0.92307692
 0.96774194 0.96774194 0.92063492 0.93333333]

mean value: 0.9602906032139903

key: train_fscore
value: [0.98924731 0.98571429 0.98220641 0.98924731 0.98747764 0.98924731
 0.98738739 0.98752228 0.98389982 0.99820467]

mean value: 0.9880154423532531

key: test_precision
value: [1.         0.90909091 0.93939394 1.         0.96875    0.88235294
 0.96774194 0.96774194 0.90625    0.93333333]

mean value: 0.9474654993962395

key: train_precision
value: [0.98571429 0.9787234  0.97183099 0.98571429 0.98220641 0.98571429
 0.98916968 0.97879859 0.97864769 1.        ]

mean value: 0.9836519601503051

key: test_recall
value: [1.         0.96774194 1.         1.         1.         0.96774194
 0.96774194 0.96774194 0.93548387 0.93333333]

mean value: 0.9739784946236559

key: train_recall
value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576
 0.98561151 0.99640288 0.98920863 0.99641577]

mean value: 0.9924473324566154

key: test_roc_auc
value: [1.         0.89012097 0.9375     1.         0.96875    0.85887097
 0.95262097 0.95053763 0.86774194 0.90416667]

mean value: 0.9330309139784947

key: train_roc_auc
value: [0.98231837 0.97527612 0.96823386 0.98231837 0.97879724 0.98231837
 0.98224238 0.97722242 0.9736253  0.99820789]

mean value: 0.980056031046588

key: test_jcc
value: [1.         0.88235294 0.93939394 1.         0.96875    0.85714286
 0.9375     0.9375     0.85294118 0.875     ]

mean value: 0.9250580914183856

key: train_jcc
value: [0.9787234  0.97183099 0.96503497 0.9787234  0.97526502 0.9787234
 0.97508897 0.97535211 0.96830986 0.99641577]

mean value: 0.9763467891796095

MCC on Blind test: 0.13

Accuracy on Blind test: 0.31

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01342225 0.01069498 0.00975204 0.01033854 0.00994968 0.01040697
 0.01035452 0.01077914 0.01066208 0.01090336]

mean value: 0.010726356506347656

key: score_time
value: [0.01061678 0.00818062 0.00800824 0.00842381 0.00850368 0.00858855
 0.00848293 0.00844717 0.00845146 0.00849843]

mean value: 0.008620166778564453

key: test_mcc
value: [0.95299692 0.8566725  0.91188882 1.         0.86091836 0.8566725
 0.87213027 0.95250095 0.90107527 0.80833333]

mean value: 0.8973188916801316

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9787234  0.93617021 0.95744681 1.         0.93617021 0.93617021
 0.93617021 0.97826087 0.95652174 0.91304348]

mean value: 0.9528677150786309

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98412698 0.95238095 0.96666667 1.         0.95081967 0.95238095
 0.94915254 0.98360656 0.96774194 0.93333333]

mean value: 0.9640209596253838

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96875    0.9375     1.         1.         0.96666667 0.9375
 1.         1.         0.96774194 0.93333333]

mean value: 0.9711491935483871

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 0.93548387 1.         0.93548387 0.96774194
 0.90322581 0.96774194 0.96774194 0.93333333]

mean value: 0.9578494623655914

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96875    0.92137097 0.96774194 1.         0.93649194 0.92137097
 0.9516129  0.98387097 0.95053763 0.90416667]

mean value: 0.9505913978494623

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96875    0.90909091 0.93548387 1.         0.90625    0.90909091
 0.90322581 0.96774194 0.9375     0.875     ]

mean value: 0.9312133431085043

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.2

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10349464 0.09808111 0.10309243 0.10495615 0.10384583 0.10514021
 0.10287976 0.10301304 0.10425162 0.10234761]

mean value: 0.10311024188995362

key: score_time
value: [0.01685739 0.01713133 0.01867747 0.01792812 0.01854682 0.01873803
 0.01870346 0.01733375 0.01832008 0.01786637]

mean value: 0.018010282516479494

key: test_mcc
value: [0.90662544 0.8084425  0.81503725 0.90662544 0.86070252 0.76032282
 0.81048387 0.85009261 0.8059304  0.90571105]

mean value: 0.8429973908395795

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95744681 0.91489362 0.91489362 0.95744681 0.93617021 0.89361702
 0.91489362 0.93478261 0.91304348 0.95652174]

mean value: 0.9293709528214616

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96875    0.9375     0.93939394 0.96875    0.95384615 0.92307692
 0.93548387 0.95238095 0.93939394 0.96774194]

mean value: 0.9486317714543521

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.93939394 0.90909091 0.88571429 0.93939394 0.91176471 0.88235294
 0.93548387 0.9375     0.88571429 0.9375    ]

mean value: 0.9163908877333925

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 1.         1.         1.         0.96774194
 0.93548387 0.96774194 1.         1.        ]

mean value: 0.9838709677419355

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9375     0.89012097 0.875      0.9375     0.90625    0.85887097
 0.90524194 0.9172043  0.86666667 0.9375    ]

mean value: 0.9031854838709678

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93939394 0.88235294 0.88571429 0.93939394 0.91176471 0.85714286
 0.87878788 0.90909091 0.88571429 0.9375    ]

mean value: 0.9026855742296919

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.2

Accuracy on Blind test: 0.36

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00836992 0.00826001 0.00835204 0.00824237 0.00821042 0.00798821
 0.00828552 0.00832677 0.00851941 0.00838804]

mean value: 0.008294272422790527

key: score_time
value: [0.00871825 0.00869298 0.00867295 0.0086019  0.00861168 0.00866127
 0.0086937  0.00873017 0.0088346  0.0087533 ]

mean value: 0.008697080612182616

key: test_mcc
value: [0.86091836 0.71206211 0.65309894 0.81952077 0.8084425  0.65994312
 0.50614703 0.60602162 0.44695591 0.72379255]

mean value: 0.6796902925193711

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.93617021 0.87234043 0.82978723 0.91489362 0.91489362 0.85106383
 0.76595745 0.80434783 0.76086957 0.86956522]

mean value: 0.8519888991674376

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.95081967 0.90909091 0.86206897 0.93333333 0.9375     0.89230769
 0.81355932 0.84210526 0.82539683 0.89655172]

mean value: 0.8862733707106872

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96666667 0.85714286 0.92592593 0.96551724 0.90909091 0.85294118
 0.85714286 0.92307692 0.8125     0.92857143]

mean value: 0.8998575985467466

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.93548387 0.96774194 0.80645161 0.90322581 0.96774194 0.93548387
 0.77419355 0.77419355 0.83870968 0.86666667]

mean value: 0.8769892473118279

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.93649194 0.82762097 0.84072581 0.9203629  0.89012097 0.81149194
 0.76209677 0.82043011 0.71935484 0.87083333]

mean value: 0.8399529569892473

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.90625    0.83333333 0.75757576 0.875      0.88235294 0.80555556
 0.68571429 0.72727273 0.7027027  0.8125    ]

mean value: 0.7988257303330832

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.38

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.29370928 1.2518785  1.24979663 1.24180865 1.26994014 1.25986075
 1.2572484  1.2555747  1.23349094 1.23494911]

mean value: 1.2548257112503052

key: score_time
value: [0.09408879 0.09164119 0.08997083 0.09628367 0.09728193 0.1462996
 0.09323502 0.08956718 0.08982635 0.08968997]

mean value: 0.09778845310211182

key: test_mcc
value: [1.         0.8566725  1.         1.         0.90662544 0.81503725
 1.         0.95250095 0.95087679 0.85513419]

mean value: 0.9336847119207848

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.93617021 1.         1.         0.95744681 0.91489362
 1.         0.97826087 0.97826087 0.93478261]

mean value: 0.9699814986123959

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95238095 1.         1.         0.96875    0.93939394
 1.         0.98360656 0.98412698 0.95081967]

mean value: 0.9779078105410073

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9375     1.         1.         0.93939394 0.88571429
 1.         1.         0.96875    0.93548387]

mean value: 0.9666842096075967

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 1.         1.         1.         1.
 1.         0.96774194 1.         0.96666667]

mean value: 0.9902150537634409

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.92137097 1.         1.         0.9375     0.875
 1.         0.98387097 0.96666667 0.92083333]

mean value: 0.9605241935483871

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90909091 1.         1.         0.93939394 0.88571429
 1.         0.96774194 0.96875    0.90625   ]

mean value: 0.9576941069683005

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.18

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(

key: fit_time
value: [1.75414562 0.86297989 0.94754958 0.91948628 0.92758203 1.00107074
 0.93352938 0.92137861 0.88540673 0.90511346]

mean value: 1.0058242321014403

key: score_time
value: [0.23915219 0.2850039  0.25384307 0.23436403 0.24242306 0.2717557
 0.25083756 0.22900653 0.23912811 0.27642059]

mean value: 0.2521934747695923

key: test_mcc
value: [1.         0.8084425  0.90662544 1.         0.90662544 0.81503725
 1.         0.90107527 0.95087679 0.80651412]

mean value: 0.9095196821072326

key: train_mcc
value: [0.94694186 0.96278526 0.94694186 0.94694186 0.94694186 0.96278526
 0.95221511 0.95793986 0.95769694 0.96282875]

mean value: 0.9544018630875426

key: test_accuracy
value: [1.         0.91489362 0.95744681 1.         0.95744681 0.91489362
 1.         0.95652174 0.97826087 0.91304348]

mean value: 0.9592506938020352

key: train_accuracy
value: [0.97619048 0.98333333 0.97619048 0.97619048 0.97619048 0.98333333
 0.97857143 0.98099762 0.98099762 0.98337292]

mean value: 0.9795368171021377

key: test_fscore
value: [1.         0.9375     0.96875    1.         0.96875    0.93939394
 1.         0.96774194 0.98412698 0.93548387]

mean value: 0.9701746729972536

key: train_fscore
value: [0.9822695  0.98752228 0.9822695  0.9822695  0.9822695  0.98752228
 0.98401421 0.9858156  0.98576512 0.98756661]

mean value: 0.9847284121907804

key: test_precision
value: [1.         0.90909091 0.93939394 1.         0.93939394 0.88571429
 1.         0.96774194 0.96875    0.90625   ]

mean value: 0.9516335009076945

key: train_precision
value: [0.96853147 0.97879859 0.96853147 0.96853147 0.96853147 0.97879859
 0.97192982 0.97202797 0.97535211 0.97887324]

mean value: 0.9729906195972802

key: test_recall
value: [1.         0.96774194 1.         1.         1.         1.
 1.         0.96774194 1.         0.96666667]

mean value: 0.9902150537634409

key: train_recall
value: [0.99640288 0.99640288 0.99640288 0.99640288 0.99640288 0.99640288
 0.99640288 1.         0.99640288 0.99641577]

mean value: 0.9967638792192053

key: test_roc_auc
value: [1.         0.89012097 0.9375     1.         0.9375     0.875
 1.         0.95053763 0.96666667 0.88958333]

mean value: 0.9446908602150538

key: train_roc_auc
value: [0.9665113  0.97707468 0.9665113  0.9665113  0.9665113  0.97707468
 0.97003242 0.97202797 0.97372591 0.97708112]

mean value: 0.9713061984493545

key: test_jcc
value: [1.         0.88235294 0.93939394 1.         0.93939394 0.88571429
 1.         0.9375     0.96875    0.87878788]

mean value: 0.9431892984466514

key: train_jcc
value: [0.96515679 0.97535211 0.96515679 0.96515679 0.96515679 0.97535211
 0.96853147 0.97202797 0.97192982 0.9754386 ]

mean value: 0.9699259264664533

MCC on Blind test: 0.08

Accuracy on Blind test: 0.19

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01817513 0.0076189  0.00755739 0.00759125 0.0075407  0.00757432
 0.00772119 0.0076313  0.0076437  0.00751853]

mean value: 0.008657240867614746

key: score_time
value: [0.0108695  0.00802517 0.00807548 0.00796461 0.00793648 0.00793743
 0.00874805 0.00799894 0.00802493 0.00804806]

mean value: 0.008362865447998047

key: test_mcc
value: [0.76746995 0.61207663 0.31752781 0.71206211 0.76032282 0.6139232
 0.66402366 0.59332241 0.38733878 0.70954337]

mean value: 0.6137610732708011

key: train_mcc
value: [0.62791789 0.64521328 0.66619129 0.63945586 0.63982246 0.63982246
 0.6506538  0.65794031 0.65846852 0.63442864]

mean value: 0.6459914516114823

key: test_accuracy
value: [0.89361702 0.82978723 0.70212766 0.87234043 0.89361702 0.82978723
 0.85106383 0.82608696 0.73913043 0.86956522]

mean value: 0.8307123034227567

key: train_accuracy
value: [0.83809524 0.8452381  0.85238095 0.84285714 0.84285714 0.84285714
 0.84761905 0.85035629 0.85035629 0.84085511]

mean value: 0.8453472457866757

key: test_fscore
value: [0.91803279 0.875      0.78125    0.90909091 0.92307692 0.88235294
 0.88888889 0.875      0.8125     0.90625   ]

mean value: 0.8771442449118437

key: train_fscore
value: [0.88316151 0.88773748 0.89007092 0.8862069  0.88581315 0.88581315
 0.88965517 0.89156627 0.89081456 0.88468158]

mean value: 0.8875520685563664

key: test_precision
value: [0.93333333 0.84848485 0.75757576 0.85714286 0.88235294 0.81081081
 0.875      0.84848485 0.78787879 0.85294118]

mean value: 0.8454005361358302

key: train_precision
value: [0.84539474 0.8538206  0.87762238 0.85099338 0.85333333 0.85333333
 0.85430464 0.85478548 0.85953177 0.85099338]

mean value: 0.8554113020989377

key: test_recall
value: [0.90322581 0.90322581 0.80645161 0.96774194 0.96774194 0.96774194
 0.90322581 0.90322581 0.83870968 0.96666667]

mean value: 0.9127956989247312

key: train_recall
value: [0.92446043 0.92446043 0.9028777  0.92446043 0.92086331 0.92086331
 0.92805755 0.93165468 0.92446043 0.92114695]

mean value: 0.9223305226786314

key: test_roc_auc
value: [0.8891129  0.7953629  0.65322581 0.82762097 0.85887097 0.76512097
 0.8266129  0.78494624 0.68602151 0.82708333]

mean value: 0.7913978494623656

key: train_roc_auc
value: [0.79673726 0.80730064 0.82819941 0.80377951 0.80550208 0.80550208
 0.8090992  0.81198118 0.81537707 0.80212277]

mean value: 0.8085601200017799

key: test_jcc
value: [0.84848485 0.77777778 0.64102564 0.83333333 0.85714286 0.78947368
 0.8        0.77777778 0.68421053 0.82857143]

mean value: 0.783779787463998

key: train_jcc
value: [0.79076923 0.79813665 0.80191693 0.79566563 0.79503106 0.79503106
 0.80124224 0.80434783 0.803125   0.79320988]

mean value: 0.7978475494770487

MCC on Blind test: 0.24

Accuracy on Blind test: 0.47

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.08977652 0.0446949  0.05097675 0.05265212 0.04966545 0.04864025
 0.2227385  0.04262686 0.0463593  0.04706073]

mean value: 0.06951913833618165

key: score_time
value: [0.00969934 0.00960755 0.00962806 0.0097065  0.00962687 0.01001763
 0.01041269 0.01037621 0.0100019  0.01042318]

mean value: 0.009949994087219239

key: test_mcc
value: [1.         0.8566725  1.         1.         0.90524194 0.86070252
 1.         0.95250095 0.95087679 0.85513419]

mean value: 0.9381128880260178

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.93617021 1.         1.         0.95744681 0.93617021
 1.         0.97826087 0.97826087 0.93478261]

mean value: 0.972109158186864

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95238095 1.         1.         0.96774194 0.95384615
 1.         0.98360656 0.98412698 0.95081967]

mean value: 0.9792522255346158

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9375     1.         1.         0.96774194 0.91176471
 1.         1.         0.96875    0.93548387]

mean value: 0.9721240512333966

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 1.         1.         0.96774194 1.
 1.         0.96774194 1.         0.96666667]

mean value: 0.986989247311828

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.92137097 1.         1.         0.95262097 0.90625
 1.         0.98387097 0.96666667 0.92083333]

mean value: 0.9651612903225807

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90909091 1.         1.         0.9375     0.91176471
 1.         0.96774194 0.96875    0.90625   ]

mean value: 0.9601097550457133

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.2

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01633263 0.01602507 0.03087282 0.03793883 0.03829098 0.03755164
 0.03849244 0.04578662 0.03847647 0.0386765 ]

mean value: 0.033844399452209475

key: score_time
value: [0.01047325 0.01068068 0.02036643 0.01072168 0.01989603 0.02082086
 0.02522516 0.01081634 0.0206635  0.02184916]

mean value: 0.017151308059692384

key: test_mcc
value: [0.95436677 0.8566725  1.         1.         0.90662544 0.81503725
 1.         0.9085301  0.90107527 0.75776742]

mean value: 0.9100074758399945

key: train_mcc
value: [0.94131391 0.95204958 0.93598399 0.94131391 0.94674008 0.95734993
 0.93066133 0.9469026  0.9469923  0.95754545]

mean value: 0.9456853089391832

key: test_accuracy
value: [0.9787234  0.93617021 1.         1.         0.95744681 0.91489362
 1.         0.95652174 0.95652174 0.89130435]

mean value: 0.9591581868640148

key: train_accuracy
value: [0.97380952 0.97857143 0.97142857 0.97380952 0.97619048 0.98095238
 0.96904762 0.97624703 0.97624703 0.98099762]

mean value: 0.9757301210270332

key: test_fscore
value: [0.98360656 0.95238095 1.         1.         0.96875    0.93939394
 1.         0.96666667 0.96774194 0.91803279]

mean value: 0.9696572838187725

key: train_fscore
value: [0.98039216 0.98395722 0.97864769 0.98039216 0.98220641 0.98566308
 0.97690941 0.98214286 0.98220641 0.9858156 ]

mean value: 0.9818332987468832

key: test_precision
value: [1.         0.9375     1.         1.         0.93939394 0.88571429
 1.         1.         0.96774194 0.90322581]

mean value: 0.9633575967043709

key: train_precision
value: [0.97173145 0.97526502 0.96830986 0.97173145 0.97183099 0.98214286
 0.96491228 0.9751773  0.97183099 0.9754386 ]

mean value: 0.972837078548064

key: test_recall
value: [0.96774194 0.96774194 1.         1.         1.         1.
 1.         0.93548387 0.96774194 0.93333333]

mean value: 0.9772043010752688

key: train_recall
value: [0.98920863 0.99280576 0.98920863 0.98920863 0.99280576 0.98920863
 0.98920863 0.98920863 0.99280576 0.99641577]

mean value: 0.991008483535752

key: test_roc_auc
value: [0.98387097 0.92137097 1.         1.         0.9375     0.875
 1.         0.96774194 0.95053763 0.87291667]

mean value: 0.9508938172043011

key: train_roc_auc
value: [0.9664353  0.97175499 0.96291418 0.9664353  0.96823386 0.97699868
 0.95939305 0.97012879 0.96843085 0.97356   ]

mean value: 0.9684285006076279

key: test_jcc
value: [0.96774194 0.90909091 1.         1.         0.93939394 0.88571429
 1.         0.93548387 0.9375     0.84848485]

mean value: 0.9423409789135595

key: train_jcc
value: [0.96153846 0.96842105 0.95818815 0.96153846 0.96503497 0.97173145
 0.95486111 0.96491228 0.96503497 0.97202797]

mean value: 0.9643288871692625

MCC on Blind test: 0.13

Accuracy on Blind test: 0.3

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.0186336  0.00766277 0.00761127 0.00744534 0.00747466 0.00750709
 0.00777817 0.00826359 0.00808811 0.00809813]

mean value: 0.00885627269744873

key: score_time
value: [0.00869727 0.00830197 0.00809574 0.00786757 0.00828147 0.00783634
 0.0086298  0.00836444 0.00861764 0.00868964]

mean value: 0.008338189125061036

key: test_mcc
value: [0.8566725  0.65994312 0.45918373 0.76032282 0.66337469 0.6139232
 0.52620968 0.64852426 0.50537634 0.76764947]

mean value: 0.6461179816200634

key: train_mcc
value: [0.62766379 0.63945586 0.68424763 0.64471064 0.6504316  0.67304969
 0.67293578 0.65214979 0.67466169 0.65101792]

mean value: 0.6570324374013666

key: test_accuracy
value: [0.93617021 0.85106383 0.76595745 0.89361702 0.85106383 0.82978723
 0.78723404 0.84782609 0.7826087  0.89130435]

mean value: 0.8436632747456059

key: train_accuracy
value: [0.83809524 0.84285714 0.86190476 0.8452381  0.84761905 0.85714286
 0.85714286 0.847981   0.85748219 0.847981  ]

mean value: 0.8503444180522566

key: test_fscore
value: [0.95238095 0.89230769 0.83076923 0.92307692 0.89552239 0.88235294
 0.83870968 0.88888889 0.83870968 0.92307692]

mean value: 0.8865795294575493

key: train_fscore
value: [0.88356164 0.8862069  0.9        0.88850772 0.89003436 0.89655172
 0.89726027 0.89041096 0.89726027 0.89003436]

mean value: 0.8919828218593321

key: test_precision
value: [0.9375     0.85294118 0.79411765 0.88235294 0.83333333 0.81081081
 0.83870968 0.875      0.83870968 0.85714286]

mean value: 0.8520618120831593

key: train_precision
value: [0.84313725 0.85099338 0.86423841 0.84918033 0.85197368 0.86092715
 0.85620915 0.8496732  0.85620915 0.85478548]

mean value: 0.8537327189194519

key: test_recall
value: [0.96774194 0.93548387 0.87096774 0.96774194 0.96774194 0.96774194
 0.83870968 0.90322581 0.83870968 1.        ]

mean value: 0.9258064516129032

key: train_recall
value: [0.92805755 0.92446043 0.93884892 0.93165468 0.93165468 0.9352518
 0.94244604 0.9352518  0.94244604 0.92831541]

mean value: 0.9338387354632423

key: test_roc_auc
value: [0.92137097 0.81149194 0.71673387 0.85887097 0.79637097 0.76512097
 0.76310484 0.81827957 0.75268817 0.84375   ]

mean value: 0.8047782258064516

key: train_roc_auc
value: [0.79501469 0.80377951 0.82505826 0.80385551 0.80737663 0.81973858
 0.81629344 0.80678674 0.81737687 0.80922813]

mean value: 0.8104508362630897

key: test_jcc
value: [0.90909091 0.80555556 0.71052632 0.85714286 0.81081081 0.78947368
 0.72222222 0.8        0.72222222 0.85714286]

mean value: 0.7984187434187434

key: train_jcc
value: [0.79141104 0.79566563 0.81818182 0.79938272 0.80185759 0.8125
 0.8136646  0.80246914 0.8136646  0.80185759]

mean value: 0.80506547104786

MCC on Blind test: 0.21

Accuracy on Blind test: 0.45

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00986862 0.01313043 0.0124433  0.01303792 0.01351666 0.01400781
 0.01293039 0.01448417 0.01343918 0.01248717]

mean value: 0.012934565544128418

key: score_time
value: [0.00865817 0.00993657 0.0099678  0.01048064 0.01074982 0.0105195
 0.01045227 0.01052117 0.01057601 0.01054454]

mean value: 0.010240650177001953

key: test_mcc
value: [1.         0.8566725  1.         0.95436677 0.90662544 0.81503725
 0.90524194 0.9085301  0.7725558  0.85513419]

mean value: 0.8974163989404769

key: train_mcc
value: [0.93593571 0.9627116  0.92552437 0.92120646 0.92557595 0.85221677
 0.93598399 0.94195411 0.93206488 0.89469123]

mean value: 0.9227865066682192

key: test_accuracy
value: [1.         0.93617021 1.         0.9787234  0.95744681 0.91489362
 0.95744681 0.95652174 0.89130435 0.93478261]

mean value: 0.9527289546716003

key: train_accuracy
value: [0.97142857 0.98333333 0.96666667 0.96428571 0.96666667 0.93333333
 0.97142857 0.97387173 0.96912114 0.95249406]

mean value: 0.965262979300984

key: test_fscore
value: [1.         0.95238095 1.         0.98360656 0.96875    0.93939394
 0.96774194 0.96666667 0.91525424 0.95081967]

mean value: 0.9644613960721762

key: train_fscore
value: [0.97857143 0.98743268 0.97482014 0.97277677 0.97526502 0.95172414
 0.97864769 0.98053097 0.97640653 0.96527778]

mean value: 0.9741453144247227

key: test_precision
value: [1.         0.9375     1.         1.         0.93939394 0.88571429
 0.96774194 1.         0.96428571 0.93548387]

mean value: 0.9630119745845552

key: train_precision
value: [0.97163121 0.98566308 0.97482014 0.98168498 0.95833333 0.91390728
 0.96830986 0.96515679 0.98534799 0.93602694]

mean value: 0.9640881606737391

key: test_recall
value: [1.         0.96774194 1.         0.96774194 1.         1.
 0.96774194 0.93548387 0.87096774 0.96666667]

mean value: 0.9676344086021506

key: train_recall
value: [0.98561151 0.98920863 0.97482014 0.96402878 0.99280576 0.99280576
 0.98920863 0.99640288 0.9676259  0.99641577]

mean value: 0.984893375622083

key: test_roc_auc
value: [1.         0.92137097 1.         0.98387097 0.9375     0.875
 0.95262097 0.96774194 0.90215054 0.92083333]

mean value: 0.946108870967742

key: train_roc_auc
value: [0.96463674 0.98051981 0.96276218 0.96440875 0.95414936 0.90485358
 0.96291418 0.9632364  0.96982694 0.93130648]

mean value: 0.9558614420708662

key: test_jcc
value: [1.         0.90909091 1.         0.96774194 0.93939394 0.88571429
 0.9375     0.93548387 0.84375    0.90625   ]

mean value: 0.9324924940650747

key: train_jcc
value: [0.95804196 0.9751773  0.95087719 0.94699647 0.95172414 0.90789474
 0.95818815 0.96180556 0.95390071 0.93288591]

mean value: 0.9497492121318976

MCC on Blind test: 0.09

Accuracy on Blind test: 0.27

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0119803  0.012357   0.01353669 0.01262236 0.01239181 0.01188588
 0.01398087 0.01292968 0.01270461 0.01215243]

mean value: 0.01265416145324707

key: score_time
value: [0.01043653 0.01049995 0.01050258 0.0104773  0.01051712 0.01048827
 0.0106318  0.01071954 0.01067996 0.01075029]

mean value: 0.010570335388183593

key: test_mcc
value: [1.         0.8084425  0.87213027 0.95299692 0.90662544 0.78063446
 0.95299692 0.85009261 0.81245565 0.76471368]

mean value: 0.8701088462869901

key: train_mcc
value: [0.93057824 0.96269263 0.86379539 0.93066133 0.86786568 0.85610492
 0.94674008 0.88991881 0.94166847 0.91286344]

mean value: 0.9102888984394315

key: test_accuracy
value: [1.         0.91489362 0.93617021 0.9787234  0.95744681 0.89361702
 0.9787234  0.93478261 0.91304348 0.89130435]

mean value: 0.9398704902867715

key: train_accuracy
value: [0.96904762 0.98333333 0.93571429 0.96904762 0.94047619 0.93095238
 0.97619048 0.95011876 0.97387173 0.95961995]

mean value: 0.9588372356068318

key: test_fscore
value: [1.         0.9375     0.94915254 0.98412698 0.96875    0.91525424
 0.98412698 0.95238095 0.93333333 0.91525424]

mean value: 0.9539879270917406

key: train_fscore
value: [0.97682709 0.98747764 0.94990724 0.97690941 0.95667244 0.94579439
 0.98220641 0.96347826 0.98025135 0.96892139]

mean value: 0.9688445621247324

key: test_precision
value: [1.         0.90909091 1.         0.96875    0.93939394 0.96428571
 0.96875    0.9375     0.96551724 0.93103448]

mean value: 0.9584322286908494

key: train_precision
value: [0.96819788 0.98220641 0.98084291 0.96491228 0.92307692 0.9844358
 0.97183099 0.93265993 0.97849462 0.98880597]

mean value: 0.9675463711254643

key: test_recall
value: [1.         0.96774194 0.90322581 1.         1.         0.87096774
 1.         0.96774194 0.90322581 0.9       ]

mean value: 0.9512903225806452

key: train_recall
value: [0.98561151 0.99280576 0.92086331 0.98920863 0.99280576 0.91007194
 0.99280576 0.99640288 0.98201439 0.94982079]

mean value: 0.971241071658802

key: test_roc_auc
value: [1.         0.89012097 0.9516129  0.96875    0.9375     0.90423387
 0.96875    0.9172043  0.91827957 0.8875    ]

mean value: 0.9343951612903226

key: train_roc_auc
value: [0.96111561 0.97879724 0.94282602 0.95939305 0.91541696 0.94095146
 0.96823386 0.92827137 0.97002817 0.96434701]

mean value: 0.9529380774427173

key: test_jcc
value: [1.         0.88235294 0.90322581 0.96875    0.93939394 0.84375
 0.96875    0.90909091 0.875      0.84375   ]

mean value: 0.9134063596112932

key: train_jcc
value: [0.95470383 0.97526502 0.90459364 0.95486111 0.91694352 0.89716312
 0.96503497 0.9295302  0.96126761 0.93971631]

mean value: 0.9399079327337388

MCC on Blind test: 0.07

Accuracy on Blind test: 0.21

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.1008575  0.08776975 0.08655286 0.0874176  0.08782935 0.08918238
 0.09195852 0.09141636 0.09183121 0.08885765]

mean value: 0.09036731719970703

key: score_time
value: [0.01442814 0.0153048  0.01412559 0.01522112 0.01434422 0.0145371
 0.01523519 0.01551008 0.0142715  0.01540041]

mean value: 0.014837813377380372

key: test_mcc
value: [0.90524194 0.8566725  0.95436677 1.         0.90662544 0.81503725
 0.95436677 0.95250095 0.95087679 0.75806977]

mean value: 0.9053758183184529

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.95744681 0.93617021 0.9787234  1.         0.95744681 0.91489362
 0.9787234  0.97826087 0.97826087 0.89130435]

mean value: 0.9571230342275671

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96774194 0.95238095 0.98360656 1.         0.96875    0.93939394
 0.98360656 0.98360656 0.98412698 0.92063492]

mean value: 0.9683848404151815

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96774194 0.9375     1.         1.         0.93939394 0.88571429
 1.         1.         0.96875    0.87878788]

mean value: 0.9577888039379975

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96774194 0.96774194 0.96774194 1.         1.         1.
 0.96774194 0.96774194 1.         0.96666667]

mean value: 0.9805376344086022

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.95262097 0.92137097 0.98387097 1.         0.9375     0.875
 0.98387097 0.98387097 0.96666667 0.85833333]

mean value: 0.9463104838709677

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9375     0.90909091 0.96774194 1.         0.93939394 0.88571429
 0.96774194 0.96774194 0.96875    0.85294118]

mean value: 0.9396616117121336

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.19

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03681731 0.03343534 0.04631495 0.04816437 0.05419993 0.04136348
 0.03020048 0.03114557 0.05245137 0.04301977]

mean value: 0.04171125888824463

key: score_time
value: [0.02169442 0.01837158 0.02844691 0.01603293 0.03581977 0.02635193
 0.0178473  0.01740122 0.02229071 0.01603532]

mean value: 0.02202920913696289

key: test_mcc
value: [0.95299692 0.8566725  1.         1.         0.8566725  0.81503725
 0.91188882 0.95250095 0.95087679 0.85927505]

mean value: 0.9155920774240871

key: train_mcc
value: [0.97879832 1.         0.99468526 0.98945277 0.98408467 0.99468526
 0.98945277 0.99472781 0.98940987 0.98946562]

mean value: 0.9904762341887853

key: test_accuracy
value: [0.9787234  0.93617021 1.         1.         0.93617021 0.91489362
 0.95744681 0.97826087 0.97826087 0.93478261]

mean value: 0.9614708603145236

key: train_accuracy
value: [0.99047619 1.         0.99761905 0.9952381  0.99285714 0.99761905
 0.9952381  0.9976247  0.99524941 0.99524941]

mean value: 0.995717113448705

key: test_fscore
value: [0.98412698 0.95238095 1.         1.         0.95238095 0.93939394
 0.96666667 0.98360656 0.98412698 0.94915254]

mean value: 0.9711835578826409

key: train_fscore
value: [0.99285714 1.         0.99820467 0.99638989 0.99463327 0.99820467
 0.99638989 0.9981982  0.99640288 0.99640288]

mean value: 0.9967683489274677

key: test_precision
value: [0.96875    0.9375     1.         1.         0.9375     0.88571429
 1.         1.         0.96875    0.96551724]

mean value: 0.9663731527093596

key: train_precision
value: [0.9858156  1.         0.99641577 1.         0.98932384 0.99641577
 1.         1.         0.99640288 1.        ]

mean value: 0.9964373865169729

key: test_recall
value: [1.         0.96774194 1.         1.         0.96774194 1.
 0.93548387 0.96774194 1.         0.93333333]

mean value: 0.9772043010752688

key: train_recall
value: [1.         1.         1.         0.99280576 1.         1.
 0.99280576 0.99640288 0.99640288 0.99283154]

mean value: 0.9971248807405688

key: test_roc_auc
value: [0.96875    0.92137097 1.         1.         0.92137097 0.875
 0.96774194 0.98387097 0.96666667 0.93541667]

mean value: 0.954018817204301

key: train_roc_auc
value: [0.98591549 1.         0.99647887 0.99640288 0.98943662 0.99647887
 0.99640288 0.99820144 0.99470494 0.99641577]

mean value: 0.995043775936127

key: test_jcc
value: [0.96875    0.90909091 1.         1.         0.90909091 0.88571429
 0.93548387 0.96774194 0.96875    0.90322581]

mean value: 0.9447847716799329

key: train_jcc
value: [0.9858156  1.         0.99641577 0.99280576 0.98932384 0.99641577
 0.99280576 0.99640288 0.99283154 0.99283154]

mean value: 0.9935648458398372

MCC on Blind test: 0.07

Accuracy on Blind test: 0.19

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.06595278 0.07601976 0.07935166 0.12056971 0.07869649 0.06834197
 0.13297486 0.15896058 0.14390469 0.13192463]

mean value: 0.10566971302032471

key: score_time
value: [0.01232171 0.01868176 0.01198316 0.01882172 0.01206756 0.01199055
 0.01884794 0.02548599 0.02592111 0.0252378 ]

mean value: 0.018135929107666017

key: test_mcc
value: [0.90662544 0.60908698 0.4512753  0.65994312 0.71206211 0.6139232
 0.66402366 0.59332241 0.43161973 0.76764947]

mean value: 0.6409531430663058

key: train_mcc
value: [0.80273059 0.7991351  0.79087061 0.79295441 0.78611575 0.79743374
 0.78683895 0.80017613 0.80374289 0.79643548]

mean value: 0.7956433649163105

key: test_accuracy
value: [0.95744681 0.82978723 0.76595745 0.85106383 0.87234043 0.82978723
 0.85106383 0.82608696 0.76086957 0.89130435]

mean value: 0.8435707678075856

key: train_accuracy
value: [0.91190476 0.90952381 0.90714286 0.90714286 0.9047619  0.90952381
 0.9047619  0.90973872 0.91211401 0.90973872]

mean value: 0.9086353353693021

key: test_fscore
value: [0.96875    0.87878788 0.8358209  0.89230769 0.90909091 0.88235294
 0.88888889 0.875      0.83076923 0.92307692]

mean value: 0.8884845359620381

key: train_fscore
value: [0.93653516 0.93537415 0.93287435 0.93356048 0.93150685 0.93493151
 0.93174061 0.93537415 0.93653516 0.9347079 ]

mean value: 0.9343140331061971

key: test_precision
value: [0.93939394 0.82857143 0.77777778 0.85294118 0.85714286 0.81081081
 0.875      0.84848485 0.79411765 0.85714286]

mean value: 0.8441383342853931

key: train_precision
value: [0.89508197 0.88709677 0.89438944 0.88673139 0.88888889 0.89215686
 0.88636364 0.88709677 0.89508197 0.89768977]

mean value: 0.8910577470317502

key: test_recall
value: [1.         0.93548387 0.90322581 0.93548387 0.96774194 0.96774194
 0.90322581 0.90322581 0.87096774 1.        ]

mean value: 0.9387096774193548

key: train_recall
value: [0.98201439 0.98920863 0.97482014 0.98561151 0.97841727 0.98201439
 0.98201439 0.98920863 0.98201439 0.97491039]

mean value: 0.9820234135272428

key: test_roc_auc
value: [0.9375     0.78024194 0.7016129  0.81149194 0.82762097 0.76512097
 0.8266129  0.78494624 0.70215054 0.84375   ]

mean value: 0.7981048387096774

key: train_roc_auc
value: [0.87833114 0.87136488 0.87473402 0.86956632 0.86949032 0.87481001
 0.86776776 0.87222669 0.87911908 0.87830027]

mean value: 0.8735710488300057

key: test_jcc
value: [0.93939394 0.78378378 0.71794872 0.80555556 0.83333333 0.78947368
 0.8        0.77777778 0.71052632 0.85714286]

mean value: 0.8014935964935965

key: train_jcc
value: [0.88064516 0.87859425 0.87419355 0.87539936 0.87179487 0.8778135
 0.87220447 0.87859425 0.88064516 0.87741935]

mean value: 0.8767303934692845

MCC on Blind test: 0.21

Accuracy on Blind test: 0.42

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.2230103  0.21394682 0.20187664 0.20994234 0.20898438 0.21629405
 0.21086693 0.20910215 0.21160555 0.20683503]

mean value: 0.21124641895294188

key: score_time
value: [0.00933719 0.00840378 0.00872827 0.00917697 0.00930619 0.00924182
 0.00842547 0.00914001 0.00950432 0.00904679]

mean value: 0.009031081199645996

key: test_mcc
value: [1.         0.8566725  1.         1.         0.95299692 0.81503725
 1.         0.95250095 0.95087679 0.80833333]

mean value: 0.9336417737001077

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.93617021 1.         1.         0.9787234  0.91489362
 1.         0.97826087 0.97826087 0.91304348]

mean value: 0.9699352451433858

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95238095 1.         1.         0.98412698 0.93939394
 1.         0.98360656 0.98412698 0.93333333]

mean value: 0.9776968750739242

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9375     1.         1.         0.96875    0.88571429
 1.         1.         0.96875    0.93333333]

mean value: 0.9694047619047619

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 1.         1.         1.         1.
 1.         0.96774194 1.         0.93333333]

mean value: 0.9868817204301076

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.92137097 1.         1.         0.96875    0.875
 1.         0.98387097 0.96666667 0.90416667]

mean value: 0.9619825268817205

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90909091 1.         1.         0.96875    0.88571429
 1.         0.96774194 0.96875    0.875     ]

mean value: 0.9575047130289066

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.19

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.0117166  0.01313019 0.01318526 0.01325989 0.01305079 0.01312709
 0.01312232 0.01324248 0.01329851 0.0137887 ]

mean value: 0.01309218406677246

key: score_time
value: [0.0111506  0.01089978 0.01084971 0.0108676  0.01087546 0.01084447
 0.01105189 0.01162434 0.01162648 0.01165462]

mean value: 0.011144495010375977

key: test_mcc
value: [0.46502704 0.68913865 0.66402366 0.71206211 0.6139232  0.67402153
 0.62096774 0.74844698 0.44695591 0.53674504]

mean value: 0.6171311872005444

key: train_mcc
value: [0.6778431  0.7128472  0.85474068 0.79307454 0.73273261 0.88954988
 0.79770673 0.82923345 0.77993671 0.88249782]

mean value: 0.7950162701330918

key: test_accuracy
value: [0.70212766 0.85106383 0.85106383 0.87234043 0.82978723 0.85106383
 0.82978723 0.89130435 0.76086957 0.7826087 ]

mean value: 0.8222016651248844

key: train_accuracy
value: [0.82142857 0.8452381  0.93333333 0.9047619  0.88095238 0.95
 0.90714286 0.9239905  0.90261283 0.94536817]

mean value: 0.9014828639294198

key: test_fscore
value: [0.73076923 0.88135593 0.88888889 0.90909091 0.88235294 0.8852459
 0.87096774 0.92307692 0.82539683 0.82758621]

mean value: 0.8624731501074018

key: train_fscore
value: [0.84662577 0.86973948 0.94871795 0.92647059 0.91582492 0.96188748
 0.92844037 0.94425087 0.92794376 0.95779817]

mean value: 0.9227699340095629

key: test_precision
value: [0.9047619  0.92857143 0.875      0.85714286 0.81081081 0.9
 0.87096774 0.88235294 0.8125     0.85714286]

mean value: 0.8699250541541813

key: train_precision
value: [0.98104265 0.98190045 0.96641791 0.94736842 0.86075949 0.97069597
 0.94756554 0.91554054 0.90721649 0.98120301]

mean value: 0.9459710488360232

key: test_recall
value: [0.61290323 0.83870968 0.90322581 0.96774194 0.96774194 0.87096774
 0.87096774 0.96774194 0.83870968 0.8       ]

mean value: 0.8638709677419355

key: train_recall
value: [0.74460432 0.78057554 0.93165468 0.90647482 0.97841727 0.95323741
 0.91007194 0.97482014 0.94964029 0.93548387]

mean value: 0.906498027384544

key: test_roc_auc
value: [0.74395161 0.85685484 0.8266129  0.82762097 0.76512097 0.84173387
 0.81048387 0.85053763 0.71935484 0.775     ]

mean value: 0.8017271505376344

key: train_roc_auc
value: [0.85821765 0.87620326 0.9341372  0.90394164 0.83427906 0.94844969
 0.9057402  0.89999748 0.88041455 0.9501363 ]

mean value: 0.8991517025527074

key: test_jcc
value: [0.57575758 0.78787879 0.8        0.83333333 0.78947368 0.79411765
 0.77142857 0.85714286 0.7027027  0.70588235]

mean value: 0.7617717512454355

key: train_jcc
value: [0.73404255 0.76950355 0.90243902 0.8630137  0.8447205  0.92657343
 0.86643836 0.89438944 0.86557377 0.91901408]

mean value: 0.8585708395886121

MCC on Blind test: 0.13

Accuracy on Blind test: 0.63

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02074528 0.02051353 0.01858282 0.02953506 0.02967763 0.0314672
 0.02946687 0.02941847 0.02958179 0.02945447]

mean value: 0.026844310760498046

key: score_time
value: [0.02140307 0.01061296 0.01083541 0.02066278 0.0109098  0.01821399
 0.02039957 0.01891303 0.02117467 0.01968718]

mean value: 0.017281246185302735

key: test_mcc
value: [0.95299692 0.8084425  0.8566725  0.95299692 0.90662544 0.76032282
 0.90662544 0.80215054 0.75776742 0.85513419]

mean value: 0.8559734697377736

key: train_mcc
value: [0.92003671 0.92030205 0.87684521 0.89326029 0.93085643 0.90414739
 0.88770942 0.9151442  0.88322214 0.90932054]

mean value: 0.9040844381960059

key: test_accuracy
value: [0.9787234  0.91489362 0.93617021 0.9787234  0.95744681 0.89361702
 0.95744681 0.91304348 0.89130435 0.93478261]

mean value: 0.9356151711378353

key: train_accuracy
value: [0.96428571 0.96428571 0.9452381  0.95238095 0.96904762 0.95714286
 0.95       0.96199525 0.94774347 0.95961995]

mean value: 0.9571739622214681

key: test_fscore
value: [0.98412698 0.9375     0.95238095 0.98412698 0.96875    0.92307692
 0.96875    0.93548387 0.91803279 0.95081967]

mean value: 0.9523048173695978

key: train_fscore
value: [0.97345133 0.97354497 0.95943563 0.96478873 0.97699115 0.96830986
 0.96296296 0.97173145 0.96140351 0.97001764]

mean value: 0.9682637226255115

key: test_precision
value: [0.96875    0.90909091 0.9375     0.96875    0.93939394 0.88235294
 0.93939394 0.93548387 0.93333333 0.93548387]

mean value: 0.9349532804324076

key: train_precision
value: [0.95818815 0.9550173  0.94117647 0.94482759 0.96167247 0.94827586
 0.94463668 0.95486111 0.93835616 0.95486111]

mean value: 0.9501872911886335

key: test_recall
value: [1.         0.96774194 0.96774194 1.         1.         0.96774194
 1.         0.93548387 0.90322581 0.96666667]

mean value: 0.9708602150537634

key: train_recall
value: [0.98920863 0.99280576 0.97841727 0.98561151 0.99280576 0.98920863
 0.98201439 0.98920863 0.98561151 0.98566308]

mean value: 0.9870555168768211

key: test_roc_auc
value: [0.96875    0.89012097 0.92137097 0.96875    0.9375     0.85887097
 0.9375     0.90107527 0.88494624 0.92083333]

mean value: 0.9189717741935484

key: train_roc_auc
value: [0.9523508  0.95062823 0.92934948 0.93646773 0.95767048 0.94178742
 0.93466917 0.94914977 0.92986869 0.94705689]

mean value: 0.9428998652048836

key: test_jcc
value: [0.96875    0.88235294 0.90909091 0.96875    0.93939394 0.85714286
 0.93939394 0.87878788 0.84848485 0.90625   ]

mean value: 0.9098397313470843

key: train_jcc
value: [0.94827586 0.94845361 0.9220339  0.93197279 0.9550173  0.93856655
 0.92857143 0.94501718 0.92567568 0.94178082]

mean value: 0.9385365119971703

MCC on Blind test: 0.18

Accuracy on Blind test: 0.4

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.20578527 0.09308887 0.1543951  0.2597549  0.22493172 0.19128704
 0.15882158 0.10618186 0.18721414 0.18994951]

mean value: 0.17714099884033202

key: score_time
value: [0.01115108 0.01121378 0.0221827  0.02100086 0.02120638 0.02156854
 0.01102662 0.0211966  0.01679158 0.01401901]

mean value: 0.01713571548461914

key: test_mcc
value: [1.         0.8566725  1.         0.95299692 0.90662544 0.81503725
 0.95299692 0.9085301  0.90107527 0.85513419]

mean value: 0.914906858369952

key: train_mcc
value: [0.92522791 0.94131391 0.91988445 0.92534566 0.93598399 0.94131391
 0.93066133 0.94171645 0.93099139 0.94680199]

mean value: 0.9339241011569926

key: test_accuracy
value: [1.         0.93617021 1.         0.9787234  0.95744681 0.91489362
 0.9787234  0.95652174 0.95652174 0.93478261]

mean value: 0.9613783533765032

key: train_accuracy
value: [0.96666667 0.97380952 0.96428571 0.96666667 0.97142857 0.97380952
 0.96904762 0.97387173 0.96912114 0.97624703]

mean value: 0.970495419070241

key: test_fscore
value: [1.         0.95238095 1.         0.98412698 0.96875    0.93939394
 0.98412698 0.96666667 0.96774194 0.95081967]

mean value: 0.9714007134310545

key: train_fscore
value: [0.97508897 0.98039216 0.97335702 0.9751773  0.97864769 0.98039216
 0.97690941 0.98046181 0.97690941 0.9822695 ]

mean value: 0.9779605432457805

key: test_precision
value: [1.         0.9375     1.         0.96875    0.93939394 0.88571429
 0.96875    1.         0.96774194 0.93548387]

mean value: 0.9603334031559838

key: train_precision
value: [0.96478873 0.97173145 0.96140351 0.96153846 0.96830986 0.97173145
 0.96491228 0.96842105 0.96491228 0.97192982]

mean value: 0.9669678897982681

key: test_recall
value: [1.         0.96774194 1.         1.         1.         1.
 1.         0.93548387 0.96774194 0.96666667]

mean value: 0.983763440860215

key: train_recall
value: [0.98561151 0.98920863 0.98561151 0.98920863 0.98920863 0.98920863
 0.98920863 0.99280576 0.98920863 0.99283154]

mean value: 0.9892112116758206

key: test_roc_auc
value: [1.         0.92137097 1.         0.96875    0.9375     0.875
 0.96875    0.96774194 0.95053763 0.92083333]

mean value: 0.9510483870967742

key: train_roc_auc
value: [0.95759449 0.9664353  0.95407336 0.95587192 0.96291418 0.9664353
 0.95939305 0.96493435 0.95963928 0.96824676]

mean value: 0.9615537984903284

key: test_jcc
value: [1.         0.90909091 1.         0.96875    0.93939394 0.88571429
 0.96875    0.93548387 0.9375     0.90625   ]

mean value: 0.9450933005166876

key: train_jcc
value: [0.95138889 0.96153846 0.94809689 0.95155709 0.95818815 0.96153846
 0.95486111 0.96167247 0.95486111 0.96515679]

mean value: 0.9568859435029576

MCC on Blind test: 0.14

Accuracy on Blind test: 0.33

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02589321 0.0243125  0.02480197 0.02414203 0.02598643 0.0267787
 0.02300882 0.02308178 0.02694511 0.02658701]

mean value: 0.025153756141662598

key: score_time
value: [0.01105022 0.01108479 0.02707553 0.01083922 0.01093078 0.01091671
 0.01093459 0.01086307 0.01094246 0.01091409]

mean value: 0.012555146217346191

key: test_mcc
value: [1.         0.7130241  0.77784447 0.83914639 0.87096774 0.87096774
 0.74193548 0.84266484 0.67314268 0.8688172 ]

mean value: 0.8198510652102912

key: train_mcc
value: [0.87415162 0.85611511 0.87052613 0.84894283 0.84894283 0.84892086
 0.85256763 0.84537297 0.86364692 0.85997009]

mean value: 0.8569156981998511

key: test_accuracy
value: [1.         0.85483871 0.88709677 0.91935484 0.93548387 0.93548387
 0.87096774 0.91935484 0.83606557 0.93442623]

mean value: 0.9093072448439978

key: train_accuracy
value: [0.93705036 0.92805755 0.9352518  0.92446043 0.92446043 0.92446043
 0.92625899 0.92266187 0.93177738 0.92998205]

mean value: 0.9284421295997314

key: test_fscore
value: [1.         0.86153846 0.89230769 0.92063492 0.93548387 0.93548387
 0.87096774 0.91525424 0.84375    0.93333333]

mean value: 0.9108754128973511

key: train_fscore
value: [0.93670886 0.92805755 0.93548387 0.92473118 0.92473118 0.92446043
 0.92665474 0.92307692 0.93214286 0.92998205]

mean value: 0.9286029650436789

key: test_precision
value: [1.         0.82352941 0.85294118 0.90625    0.93548387 0.93548387
 0.87096774 0.96428571 0.81818182 0.93333333]

mean value: 0.9040456937907128

key: train_precision
value: [0.94181818 0.92805755 0.93214286 0.92142857 0.92142857 0.92446043
 0.92170819 0.91814947 0.92553191 0.93165468]

mean value: 0.9266380409827853

key: test_recall
value: [1.         0.90322581 0.93548387 0.93548387 0.93548387 0.93548387
 0.87096774 0.87096774 0.87096774 0.93333333]

mean value: 0.9191397849462365

key: train_recall
value: [0.93165468 0.92805755 0.93884892 0.92805755 0.92805755 0.92446043
 0.93165468 0.92805755 0.93884892 0.92831541]

mean value: 0.9306013253912999

key: test_roc_auc
value: [1.         0.85483871 0.88709677 0.91935484 0.93548387 0.93548387
 0.87096774 0.91935484 0.83548387 0.9344086 ]

mean value: 0.909247311827957

key: train_roc_auc
value: [0.93705036 0.92805755 0.9352518  0.92446043 0.92446043 0.92446043
 0.92625899 0.92266187 0.93179005 0.92998504]

mean value: 0.9284436966555788

key: test_jcc
value: [1.         0.75675676 0.80555556 0.85294118 0.87878788 0.87878788
 0.77142857 0.84375    0.72972973 0.875     ]

mean value: 0.839273754751696

key: train_jcc
value: [0.88095238 0.86577181 0.87878788 0.86       0.86       0.85953177
 0.86333333 0.85714286 0.8729097  0.86912752]

mean value: 0.8667557250647416

MCC on Blind test: 0.21

Accuracy on Blind test: 0.5

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.84429264 0.72940993 0.72171068 0.85939646 0.69464445 0.72773337
 0.77860117 0.70124364 0.78092885 0.7428112 ]

mean value: 0.7580772399902344

key: score_time
value: [0.01205468 0.01223755 0.01254439 0.01247644 0.02100563 0.01274776
 0.01243854 0.01249003 0.01463079 0.01232004]

mean value: 0.013494586944580078

key: test_mcc
value: [0.96824584 0.93548387 0.96824584 0.90748521 0.90369611 0.93548387
 1.         0.87278605 0.90215054 0.8688172 ]

mean value: 0.9262394532240339

key: train_mcc
value: [0.94966486 0.96412858 0.94604929 0.96763216 0.96405373 0.96405373
 0.94604929 0.97482645 0.96774069 0.96783888]

mean value: 0.9612037646576601

key: test_accuracy
value: [0.98387097 0.96774194 0.98387097 0.9516129  0.9516129  0.96774194
 1.         0.93548387 0.95081967 0.93442623]

mean value: 0.9627181385510312

key: train_accuracy
value: [0.97482014 0.98201439 0.97302158 0.98381295 0.98201439 0.98201439
 0.97302158 0.98741007 0.98384201 0.98384201]

mean value: 0.9805813517946863

key: test_fscore
value: [0.98412698 0.96774194 0.98360656 0.95384615 0.95081967 0.96774194
 1.         0.93333333 0.95081967 0.93333333]

mean value: 0.9625369577246891

key: train_fscore
value: [0.97491039 0.98214286 0.97307002 0.98384201 0.98207885 0.98207885
 0.97307002 0.98743268 0.98389982 0.98401421]

mean value: 0.9806539709925397

key: test_precision
value: [0.96875    0.96774194 1.         0.91176471 0.96666667 0.96774194
 1.         0.96551724 0.96666667 0.93333333]

mean value: 0.9648182484896072

key: train_precision
value: [0.97142857 0.9751773  0.97132616 0.98207885 0.97857143 0.97857143
 0.97132616 0.98566308 0.97864769 0.97535211]

mean value: 0.9768142798277739

key: test_recall
value: [1.         0.96774194 0.96774194 1.         0.93548387 0.96774194
 1.         0.90322581 0.93548387 0.93333333]

mean value: 0.9610752688172043

key: train_recall
value: [0.97841727 0.98920863 0.97482014 0.98561151 0.98561151 0.98561151
 0.97482014 0.98920863 0.98920863 0.99283154]

mean value: 0.9845349526830148

key: test_roc_auc
value: [0.98387097 0.96774194 0.98387097 0.9516129  0.9516129  0.96774194
 1.         0.93548387 0.95107527 0.9344086 ]

mean value: 0.962741935483871

key: train_roc_auc
value: [0.97482014 0.98201439 0.97302158 0.98381295 0.98201439 0.98201439
 0.97302158 0.98741007 0.98385163 0.98382584]

mean value: 0.9805806967329362

key: test_jcc
value: [0.96875    0.9375     0.96774194 0.91176471 0.90625    0.9375
 1.         0.875      0.90625    0.875     ]

mean value: 0.9285756641366224

key: train_jcc
value: [0.95104895 0.96491228 0.94755245 0.96819788 0.96478873 0.96478873
 0.94755245 0.9751773  0.96830986 0.96853147]

mean value: 0.9620860104153928

MCC on Blind test: 0.14

Accuracy on Blind test: 0.35

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01086211 0.01025057 0.00859928 0.00814486 0.00849771 0.00856209
 0.0083437  0.00834632 0.00836563 0.0083468 ]

mean value: 0.00883190631866455

key: score_time
value: [0.01082826 0.00907207 0.00906467 0.00892687 0.00863767 0.00889754
 0.00867772 0.00834537 0.00863099 0.00864482]

mean value: 0.008972597122192384

key: test_mcc
value: [0.83914639 0.64820372 0.71004695 0.81325006 0.80645161 0.74348441
 0.61418277 0.87278605 0.60645161 0.70505961]

mean value: 0.7359063194782269

key: train_mcc
value: [0.75529076 0.7627676  0.76266888 0.74820144 0.73741484 0.74837576
 0.74460913 0.73025835 0.76301539 0.75249226]

mean value: 0.7505094421634964

key: test_accuracy
value: [0.91935484 0.82258065 0.85483871 0.90322581 0.90322581 0.87096774
 0.80645161 0.93548387 0.80327869 0.85245902]

mean value: 0.8671866737176097

key: train_accuracy
value: [0.87410072 0.88129496 0.88129496 0.87410072 0.86870504 0.87410072
 0.87230216 0.86510791 0.88150808 0.87612208]

mean value: 0.8748637355824497

key: test_fscore
value: [0.92063492 0.83076923 0.85245902 0.90909091 0.90322581 0.875
 0.8        0.93333333 0.80645161 0.84745763]

mean value: 0.8678422456695318

key: train_fscore
value: [0.88215488 0.88       0.88214286 0.87410072 0.86894075 0.87272727
 0.87253142 0.86437613 0.88129496 0.87477314]

mean value: 0.8753042137774966

key: test_precision
value: [0.90625    0.79411765 0.86666667 0.85714286 0.90322581 0.84848485
 0.82758621 0.96551724 0.80645161 0.86206897]

mean value: 0.8637511852501137

key: train_precision
value: [0.82911392 0.88970588 0.87588652 0.87410072 0.86738351 0.88235294
 0.87096774 0.86909091 0.88129496 0.88602941]

mean value: 0.8725926531191879

key: test_recall
value: [0.93548387 0.87096774 0.83870968 0.96774194 0.90322581 0.90322581
 0.77419355 0.90322581 0.80645161 0.83333333]

mean value: 0.8736559139784946

key: train_recall
value: [0.94244604 0.8705036  0.88848921 0.87410072 0.8705036  0.86330935
 0.87410072 0.85971223 0.88129496 0.86379928]

mean value: 0.8788259714808798

key: test_roc_auc
value: [0.91935484 0.82258065 0.85483871 0.90322581 0.90322581 0.87096774
 0.80645161 0.93548387 0.80322581 0.85215054]

mean value: 0.8671505376344086

key: train_roc_auc
value: [0.87410072 0.88129496 0.88129496 0.87410072 0.86870504 0.87410072
 0.87230216 0.86510791 0.8815077  0.87614425]

mean value: 0.8748659137206364

key: test_jcc
value: [0.85294118 0.71052632 0.74285714 0.83333333 0.82352941 0.77777778
 0.66666667 0.875      0.67567568 0.73529412]

mean value: 0.7693601617982423

key: train_jcc
value: [0.78915663 0.78571429 0.78913738 0.77635783 0.76825397 0.77419355
 0.77388535 0.7611465  0.78778135 0.77741935]

mean value: 0.778304618898389

MCC on Blind test: 0.2

Accuracy on Blind test: 0.54

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00874949 0.00879979 0.0084486  0.00865364 0.00868559 0.00862646
 0.00848246 0.00858378 0.00882339 0.00859857]

mean value: 0.008645176887512207

key: score_time
value: [0.00911641 0.00886369 0.00856209 0.00891066 0.00887418 0.00860906
 0.00873876 0.00870085 0.00862622 0.008708  ]

mean value: 0.008770990371704101

key: test_mcc
value: [0.64820372 0.68313005 0.48488114 0.74348441 0.80813523 0.74348441
 0.64820372 0.74193548 0.63978495 0.67204301]

mean value: 0.6813286129520032

key: train_mcc
value: [0.69129181 0.69623388 0.69785979 0.69872831 0.69209976 0.70569372
 0.7019886  0.70220704 0.69929441 0.69881448]

mean value: 0.698421180066379

key: test_accuracy
value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774
 0.82258065 0.87096774 0.81967213 0.83606557]

mean value: 0.8397673188789001

key: train_accuracy
value: [0.84532374 0.8471223  0.84892086 0.84892086 0.84532374 0.85251799
 0.85071942 0.85071942 0.8491921  0.8491921 ]

mean value: 0.8487952546400941

key: test_fscore
value: [0.81355932 0.84848485 0.75       0.875      0.9        0.875
 0.83076923 0.87096774 0.81967213 0.83333333]

mean value: 0.8416786607704336

key: train_fscore
value: [0.84859155 0.85268631 0.84837545 0.85263158 0.85017422 0.8556338
 0.85361552 0.85413005 0.85263158 0.85211268]

mean value: 0.8520582734853629

key: test_precision
value: [0.85714286 0.8        0.72727273 0.84848485 0.93103448 0.84848485
 0.79411765 0.87096774 0.83333333 0.83333333]

mean value: 0.8344171819804876

key: train_precision
value: [0.83103448 0.82274247 0.85144928 0.83219178 0.82432432 0.83793103
 0.83737024 0.83505155 0.83219178 0.83737024]

mean value: 0.8341657184309065

key: test_recall
value: [0.77419355 0.90322581 0.77419355 0.90322581 0.87096774 0.90322581
 0.87096774 0.87096774 0.80645161 0.83333333]

mean value: 0.8510752688172043

key: train_recall
value: [0.86690647 0.88489209 0.84532374 0.87410072 0.87769784 0.87410072
 0.8705036  0.87410072 0.87410072 0.86738351]

mean value: 0.8709110131249839

key: test_roc_auc
value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774
 0.82258065 0.87096774 0.81989247 0.83602151]

mean value: 0.8397849462365592

key: train_roc_auc
value: [0.84532374 0.8471223  0.84892086 0.84892086 0.84532374 0.85251799
 0.85071942 0.85071942 0.84923674 0.84915938]

mean value: 0.8487964467135968

key: test_jcc
value: [0.68571429 0.73684211 0.6        0.77777778 0.81818182 0.77777778
 0.71052632 0.77142857 0.69444444 0.71428571]

mean value: 0.7286978810663021

key: train_jcc
value: [0.73700306 0.74320242 0.73667712 0.74311927 0.73939394 0.74769231
 0.74461538 0.74539877 0.74311927 0.74233129]

mean value: 0.7422552816171282

MCC on Blind test: 0.19

Accuracy on Blind test: 0.49

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00824833 0.00824618 0.00818968 0.00823784 0.00803328 0.00808263
 0.0080018  0.00826025 0.00804806 0.00802422]

mean value: 0.008137226104736328

key: score_time
value: [0.02001548 0.0169642  0.01295042 0.01175404 0.01527023 0.01145744
 0.01146245 0.01176381 0.01168776 0.01165533]

mean value: 0.01349811553955078

key: test_mcc
value: [0.75623534 0.67741935 0.64820372 0.83914639 0.80813523 0.74193548
 0.61418277 0.68313005 0.67204301 0.67721392]

mean value: 0.7117645281572317

key: train_mcc
value: [0.75664991 0.80977699 0.79501032 0.78789723 0.7814304  0.77770329
 0.79138739 0.77342633 0.78180276 0.78587941]

mean value: 0.7840964017204444

key: test_accuracy
value: [0.87096774 0.83870968 0.82258065 0.91935484 0.90322581 0.87096774
 0.80645161 0.83870968 0.83606557 0.83606557]

mean value: 0.8543098889476468

key: train_accuracy
value: [0.87769784 0.90467626 0.89748201 0.89388489 0.89028777 0.88848921
 0.89568345 0.88669065 0.89048474 0.89228007]

mean value: 0.8917656897821061

key: test_fscore
value: [0.85714286 0.83870968 0.81355932 0.92063492 0.90625    0.87096774
 0.8125     0.82758621 0.83870968 0.82142857]

mean value: 0.8507488974910993

key: train_fscore
value: [0.87407407 0.90310786 0.89692586 0.89292196 0.88766114 0.88602941
 0.89605735 0.88607595 0.88766114 0.88929889]

mean value: 0.8899813639558726

key: test_precision
value: [0.96       0.83870968 0.85714286 0.90625    0.87878788 0.87096774
 0.78787879 0.88888889 0.83870968 0.88461538]

mean value: 0.8711950894087991

key: train_precision
value: [0.90076336 0.91821561 0.90181818 0.9010989  0.90943396 0.90601504
 0.89285714 0.89090909 0.90943396 0.91634981]

mean value: 0.9046895060853061

key: test_recall
value: [0.77419355 0.83870968 0.77419355 0.93548387 0.93548387 0.87096774
 0.83870968 0.77419355 0.83870968 0.76666667]

mean value: 0.8347311827956989

key: train_recall
value: [0.84892086 0.88848921 0.89208633 0.88489209 0.86690647 0.86690647
 0.89928058 0.88129496 0.86690647 0.86379928]

mean value: 0.8759482736391532

key: test_roc_auc
value: [0.87096774 0.83870968 0.82258065 0.91935484 0.90322581 0.87096774
 0.80645161 0.83870968 0.83602151 0.83494624]

mean value: 0.8541935483870968

key: train_roc_auc
value: [0.87769784 0.90467626 0.89748201 0.89388489 0.89028777 0.88848921
 0.89568345 0.88669065 0.89044248 0.8923313 ]

mean value: 0.8917665867306155

key: test_jcc
value: [0.75       0.72222222 0.68571429 0.85294118 0.82857143 0.77142857
 0.68421053 0.70588235 0.72222222 0.6969697 ]

mean value: 0.7420162482855981

key: train_jcc
value: [0.77631579 0.82333333 0.81311475 0.80655738 0.79801325 0.79537954
 0.81168831 0.79545455 0.79801325 0.80066445]

mean value: 0.8018534590944678

MCC on Blind test: 0.17

Accuracy on Blind test: 0.56

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01743269 0.01715159 0.01687074 0.01602173 0.01673126 0.01642895
 0.01583314 0.01826096 0.01568484 0.0176158 ]

mean value: 0.016803169250488283

key: score_time
value: [0.01035261 0.00926685 0.01012945 0.00933409 0.00936246 0.01025271
 0.00945616 0.01029134 0.00932956 0.0092721 ]

mean value: 0.00970473289489746

key: test_mcc
value: [0.93548387 0.69047575 0.62471615 0.77784447 0.77784447 0.75623534
 0.58338335 0.74348441 0.61090565 0.81062315]

mean value: 0.7310996615906107

key: train_mcc
value: [0.82186847 0.79485081 0.75204143 0.78877892 0.78485761 0.7611094
 0.79209132 0.77560672 0.78260516 0.81085297]

mean value: 0.7864662785132636

key: test_accuracy
value: [0.96774194 0.83870968 0.80645161 0.88709677 0.88709677 0.87096774
 0.79032258 0.87096774 0.80327869 0.90163934]

mean value: 0.8624272871496562

key: train_accuracy
value: [0.91007194 0.89568345 0.87230216 0.89208633 0.89028777 0.87769784
 0.89388489 0.88489209 0.88868941 0.9048474 ]

mean value: 0.8910443279128941

key: test_fscore
value: [0.96774194 0.85294118 0.82352941 0.89230769 0.89230769 0.88235294
 0.8        0.875      0.81818182 0.90625   ]

mean value: 0.8710612667692839

key: train_fscore
value: [0.91289199 0.90034364 0.88067227 0.89761092 0.8957265  0.88474576
 0.8991453  0.89152542 0.89455782 0.90750436]

mean value: 0.8964723986527141

key: test_precision
value: [0.96774194 0.78378378 0.75675676 0.85294118 0.85294118 0.81081081
 0.76470588 0.84848485 0.77142857 0.85294118]

mean value: 0.8262536118513348

key: train_precision
value: [0.88513514 0.86184211 0.82649842 0.8538961  0.8534202  0.83653846
 0.85667752 0.84294872 0.8483871  0.88435374]

mean value: 0.8549697504635009

key: test_recall
value: [0.96774194 0.93548387 0.90322581 0.93548387 0.93548387 0.96774194
 0.83870968 0.90322581 0.87096774 0.96666667]

mean value: 0.9224731182795699

key: train_recall
value: [0.94244604 0.94244604 0.94244604 0.94604317 0.94244604 0.93884892
 0.94604317 0.94604317 0.94604317 0.93189964]

mean value: 0.9424705396972745

key: test_roc_auc
value: [0.96774194 0.83870968 0.80645161 0.88709677 0.88709677 0.87096774
 0.79032258 0.87096774 0.80215054 0.90268817]

mean value: 0.8624193548387098

key: train_roc_auc
value: [0.91007194 0.89568345 0.87230216 0.89208633 0.89028777 0.87769784
 0.89388489 0.88489209 0.88879219 0.90479874]

mean value: 0.8910497408524792

key: test_jcc
value: [0.9375     0.74358974 0.7        0.80555556 0.80555556 0.78947368
 0.66666667 0.77777778 0.69230769 0.82857143]

mean value: 0.7746998104234947

key: train_jcc
value: [0.83974359 0.81875    0.78678679 0.81424149 0.81114551 0.79331307
 0.81677019 0.80428135 0.80923077 0.83067093]

mean value: 0.812493367099271

MCC on Blind test: 0.25

Accuracy on Blind test: 0.48

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.58698964 1.47472477 1.58772182 1.53978014 1.45300221 1.59102941
 1.59130311 1.50179529 1.59061027 1.5674026 ]

mean value: 1.548435926437378

key: score_time
value: [0.01429367 0.01347637 0.01343799 0.0135057  0.01355076 0.01363134
 0.01368833 0.01345825 0.01342821 0.01383781]

mean value: 0.013630843162536621

key: test_mcc
value: [0.96824584 0.84266484 0.87278605 0.93743687 0.93743687 0.90369611
 0.90369611 0.90748521 0.83638369 0.8688172 ]

mean value: 0.8978648796280239

key: train_mcc
value: [0.99283145 0.98921503 0.98921503 0.98561151 0.98921503 0.98921503
 0.98561151 0.99640932 0.99284416 0.99641577]

mean value: 0.9906583855147647

key: test_accuracy
value: [0.98387097 0.91935484 0.93548387 0.96774194 0.96774194 0.9516129
 0.9516129  0.9516129  0.91803279 0.93442623]

mean value: 0.9481491274457959

key: train_accuracy
value: [0.99640288 0.99460432 0.99460432 0.99280576 0.99460432 0.99460432
 0.99280576 0.99820144 0.99640934 0.99820467]

mean value: 0.9953247097115845

key: test_fscore
value: [0.98412698 0.92307692 0.9375     0.96875    0.96875    0.95081967
 0.95238095 0.94915254 0.92063492 0.93333333]

mean value: 0.9488525328057142

key: train_fscore
value: [0.99638989 0.99459459 0.99459459 0.99280576 0.99459459 0.99459459
 0.99280576 0.9981982  0.99638989 0.99820467]

mean value: 0.9953172538625

key: test_precision
value: [0.96875    0.88235294 0.90909091 0.93939394 0.93939394 0.96666667
 0.9375     1.         0.90625    0.93333333]

mean value: 0.9382731729055258

key: train_precision
value: [1.         0.99638989 0.99638989 0.99280576 0.99638989 0.99638989
 0.99280576 1.         1.         1.        ]

mean value: 0.997117107757837

key: test_recall
value: [1.         0.96774194 0.96774194 1.         1.         0.93548387
 0.96774194 0.90322581 0.93548387 0.93333333]

mean value: 0.9610752688172043

key: train_recall
value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576
 0.99280576 0.99640288 0.99280576 0.99641577]

mean value: 0.9935264691472628

key: test_roc_auc
value: [0.98387097 0.91935484 0.93548387 0.96774194 0.96774194 0.9516129
 0.9516129  0.9516129  0.91774194 0.9344086 ]

mean value: 0.9481182795698926

key: train_roc_auc
value: [0.99640288 0.99460432 0.99460432 0.99280576 0.99460432 0.99460432
 0.99280576 0.99820144 0.99640288 0.99820789]

mean value: 0.9953243856527682

key: test_jcc
value: [0.96875    0.85714286 0.88235294 0.93939394 0.93939394 0.90625
 0.90909091 0.90322581 0.85294118 0.875     ]

mean value: 0.9033541569120317

key: train_jcc
value: [0.99280576 0.98924731 0.98924731 0.98571429 0.98924731 0.98924731
 0.98571429 0.99640288 0.99280576 0.99641577]

mean value: 0.9906847977838927

MCC on Blind test: 0.18

Accuracy on Blind test: 0.36

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01465797 0.01315546 0.01135731 0.01076937 0.01108599 0.01082325
 0.01092386 0.01045871 0.01094151 0.01048708]

mean value: 0.011466050148010254

key: score_time
value: [0.01047611 0.00827646 0.00819159 0.00810289 0.00804043 0.00810003
 0.00786495 0.00790858 0.00797296 0.00794983]

mean value: 0.008288383483886719

key: test_mcc
value: [0.96824584 0.90369611 1.         0.90748521 0.90369611 0.87831007
 0.84266484 0.96824584 0.8688172  0.87055472]

mean value: 0.9111715945488771

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 0.9516129  1.         0.9516129  0.9516129  0.93548387
 0.91935484 0.98387097 0.93442623 0.93442623]

mean value: 0.9546271813855103

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98412698 0.95081967 1.         0.95384615 0.95081967 0.93103448
 0.91525424 0.98360656 0.93548387 0.93103448]

mean value: 0.9536026113385601

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96875    0.96666667 1.         0.91176471 0.96666667 1.
 0.96428571 1.         0.93548387 0.96428571]

mean value: 0.9677903338754856

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.93548387 1.         1.         0.93548387 0.87096774
 0.87096774 0.96774194 0.93548387 0.9       ]

mean value: 0.9416129032258065

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 0.9516129  1.         0.9516129  0.9516129  0.93548387
 0.91935484 0.98387097 0.9344086  0.93387097]

mean value: 0.9545698924731183

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96875    0.90625    1.         0.91176471 0.90625    0.87096774
 0.84375    0.96774194 0.87878788 0.87096774]

mean value: 0.912523000402507

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.58

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10264206 0.10256195 0.10940504 0.10670924 0.10698891 0.10749125
 0.10634494 0.1044426  0.1049583  0.10701418]

mean value: 0.10585584640502929

key: score_time
value: [0.01734233 0.01776242 0.01870441 0.01858568 0.01841116 0.01849437
 0.01723242 0.01806641 0.01844049 0.01706672]

mean value: 0.018010640144348146

key: test_mcc
value: [0.93743687 0.81325006 0.87096774 0.87278605 0.93743687 0.90369611
 0.80645161 0.93743687 0.8688172  0.90215054]

mean value: 0.8850429919540547

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96774194 0.90322581 0.93548387 0.93548387 0.96774194 0.9516129
 0.90322581 0.96774194 0.93442623 0.95081967]

mean value: 0.9417503966155474

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96875    0.90909091 0.93548387 0.9375     0.96875    0.95238095
 0.90322581 0.96666667 0.93548387 0.95081967]

mean value: 0.9428151748656772

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.93939394 0.85714286 0.93548387 0.90909091 0.93939394 0.9375
 0.90322581 1.         0.93548387 0.93548387]

mean value: 0.9292199064376484

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 0.93548387 0.96774194 1.         0.96774194
 0.90322581 0.93548387 0.93548387 0.96666667]

mean value: 0.9579569892473119

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96774194 0.90322581 0.93548387 0.93548387 0.96774194 0.9516129
 0.90322581 0.96774194 0.9344086  0.95107527]

mean value: 0.9417741935483872

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.93939394 0.83333333 0.87878788 0.88235294 0.93939394 0.90909091
 0.82352941 0.93548387 0.87878788 0.90625   ]

mean value: 0.8926404102696797

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.22

Accuracy on Blind test: 0.4

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00817394 0.0080502  0.0084374  0.00861168 0.00831032 0.00796342
 0.00770545 0.00855494 0.00841975 0.00792027]

mean value: 0.008214735984802246

key: score_time
value: [0.00823379 0.00851464 0.00845742 0.00857925 0.00831676 0.00783062
 0.00798845 0.00863981 0.00799108 0.00859761]

mean value: 0.008314943313598633

key: test_mcc
value: [0.71004695 0.5809475  0.67883359 0.59603956 0.64549722 0.77784447
 0.65372045 0.67883359 0.77072165 0.77096774]

mean value: 0.68634527326083

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.85483871 0.79032258 0.83870968 0.79032258 0.82258065 0.88709677
 0.82258065 0.83870968 0.8852459  0.8852459 ]

mean value: 0.841565309360127

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85245902 0.79365079 0.83333333 0.76363636 0.81967213 0.88135593
 0.80701754 0.83333333 0.88888889 0.8852459 ]

mean value: 0.835859323808608

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.86666667 0.78125    0.86206897 0.875      0.83333333 0.92857143
 0.88461538 0.86206897 0.875      0.87096774]

mean value: 0.8639542486156779

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.83870968 0.80645161 0.80645161 0.67741935 0.80645161 0.83870968
 0.74193548 0.80645161 0.90322581 0.9       ]

mean value: 0.8125806451612902

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.85483871 0.79032258 0.83870968 0.79032258 0.82258065 0.88709677
 0.82258065 0.83870968 0.88494624 0.88548387]

mean value: 0.8415591397849462

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.74285714 0.65789474 0.71428571 0.61764706 0.69444444 0.78787879
 0.67647059 0.71428571 0.8        0.79411765]

mean value: 0.7199881834711557

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.07

Accuracy on Blind test: 0.43

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.35455632 1.36369705 1.44976234 1.43192887 1.36655641 1.41406369
 1.44773722 1.37284899 1.38293886 1.38959265]

mean value: 1.3973682403564454

key: score_time
value: [0.09139943 0.09957314 0.09985614 0.09845757 0.09911156 0.0994699
 0.09767675 0.09422445 0.09873199 0.09957123]

mean value: 0.09780721664428711

key: test_mcc
value: [0.96824584 0.93548387 0.96824584 0.90748521 0.93743687 0.96824584
 1.         0.96824584 0.96770777 0.8688172 ]

mean value: 0.9489914273704848

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 0.96774194 0.98387097 0.9516129  0.96774194 0.98387097
 1.         0.98387097 0.98360656 0.93442623]

mean value: 0.9740613432046537

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98412698 0.96774194 0.98412698 0.95384615 0.96875    0.98360656
 1.         0.98360656 0.98412698 0.93333333]

mean value: 0.9743265489798408

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96875    0.96774194 0.96875    0.91176471 0.93939394 1.
 1.         1.         0.96875    0.93333333]

mean value: 0.9658483914093496

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 1.         1.         1.         0.96774194
 1.         0.96774194 1.         0.93333333]

mean value: 0.9836559139784946

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 0.96774194 0.98387097 0.9516129  0.96774194 0.98387097
 1.         0.98387097 0.98333333 0.9344086 ]

mean value: 0.9740322580645162

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96875    0.9375     0.96875    0.91176471 0.93939394 0.96774194
 1.         0.96774194 0.96875    0.875     ]

mean value: 0.9505392516244034

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.16

Accuracy on Blind test: 0.35

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.87023854 0.93618393 0.92621827 0.95837045 1.00464082 0.93836141
 0.98241544 0.91587925 0.8986578  0.98821497]

mean value: 0.9419180870056152

key: score_time
value: [0.23300123 0.2598815  0.26490426 0.22142696 0.22671819 0.23441744
 0.25722957 0.27357078 0.23566699 0.21245193]

mean value: 0.2419268846511841

key: test_mcc
value: [0.96824584 0.87278605 0.93743687 0.90748521 0.93743687 0.96824584
 0.96824584 0.96824584 0.93635873 0.83655914]

mean value: 0.9301046213212982

key: train_mcc
value: [0.96778244 0.97132357 0.96778244 0.97487691 0.96768225 0.96768225
 0.96778244 0.96778244 0.97137553 0.9784809 ]

mean value: 0.9702551166949516

key: test_accuracy
value: [0.98387097 0.93548387 0.96774194 0.9516129  0.96774194 0.98387097
 0.98387097 0.98387097 0.96721311 0.91803279]

mean value: 0.9643310417768377

key: train_accuracy
value: [0.98381295 0.98561151 0.98381295 0.98741007 0.98381295 0.98381295
 0.98381295 0.98381295 0.98563734 0.98922801]

mean value: 0.9850764630665306

key: test_fscore
value: [0.98412698 0.9375     0.96875    0.95384615 0.96875    0.98360656
 0.98412698 0.98360656 0.96875    0.91803279]

mean value: 0.9651096023739466

key: train_fscore
value: [0.98395722 0.98571429 0.98395722 0.98747764 0.98389982 0.98389982
 0.98395722 0.98395722 0.98571429 0.98928571]

mean value: 0.985182044357831

key: test_precision
value: [0.96875    0.90909091 0.93939394 0.91176471 0.93939394 1.
 0.96875    1.         0.93939394 0.90322581]

mean value: 0.9479763239606693

key: train_precision
value: [0.97526502 0.9787234  0.97526502 0.98220641 0.97864769 0.97864769
 0.97526502 0.97526502 0.9787234  0.98576512]

mean value: 0.9783773783096608

key: test_recall
value: [1.         0.96774194 1.         1.         1.         0.96774194
 1.         0.96774194 1.         0.93333333]

mean value: 0.9836559139784946

key: train_recall
value: [0.99280576 0.99280576 0.99280576 0.99280576 0.98920863 0.98920863
 0.99280576 0.99280576 0.99280576 0.99283154]

mean value: 0.9920889095175472

key: test_roc_auc
value: [0.98387097 0.93548387 0.96774194 0.9516129  0.96774194 0.98387097
 0.98387097 0.98387097 0.96666667 0.91827957]

mean value: 0.9643010752688173

key: train_roc_auc
value: [0.98381295 0.98561151 0.98381295 0.98741007 0.98381295 0.98381295
 0.98381295 0.98381295 0.98565019 0.98922153]

mean value: 0.9850770996106342

key: test_jcc
value: [0.96875    0.88235294 0.93939394 0.91176471 0.93939394 0.96774194
 0.96875    0.96774194 0.93939394 0.84848485]

mean value: 0.9333768184693232

key: train_jcc
value: [0.96842105 0.97183099 0.96842105 0.97526502 0.96830986 0.96830986
 0.96842105 0.96842105 0.97183099 0.97879859]

mean value: 0.9708029504907444

MCC on Blind test: 0.15

Accuracy on Blind test: 0.4

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01861191 0.00832939 0.00832176 0.00824809 0.00866604 0.00814605
 0.00879598 0.00842834 0.00795603 0.00822353]

mean value: 0.009372711181640625

key: score_time
value: [0.00951219 0.00824142 0.00890613 0.00865459 0.00877047 0.00828552
 0.00873399 0.00808811 0.00830865 0.00846767]

mean value: 0.00859687328338623

key: test_mcc
value: [0.64820372 0.68313005 0.48488114 0.74348441 0.80813523 0.74348441
 0.64820372 0.74193548 0.63978495 0.67204301]

mean value: 0.6813286129520032

key: train_mcc
value: [0.69129181 0.69623388 0.69785979 0.69872831 0.69209976 0.70569372
 0.7019886  0.70220704 0.69929441 0.69881448]

mean value: 0.698421180066379

key: test_accuracy
value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774
 0.82258065 0.87096774 0.81967213 0.83606557]

mean value: 0.8397673188789001

key: train_accuracy
value: [0.84532374 0.8471223  0.84892086 0.84892086 0.84532374 0.85251799
 0.85071942 0.85071942 0.8491921  0.8491921 ]

mean value: 0.8487952546400941

key: test_fscore
value: [0.81355932 0.84848485 0.75       0.875      0.9        0.875
 0.83076923 0.87096774 0.81967213 0.83333333]

mean value: 0.8416786607704336

key: train_fscore
value: [0.84859155 0.85268631 0.84837545 0.85263158 0.85017422 0.8556338
 0.85361552 0.85413005 0.85263158 0.85211268]

mean value: 0.8520582734853629

key: test_precision
value: [0.85714286 0.8        0.72727273 0.84848485 0.93103448 0.84848485
 0.79411765 0.87096774 0.83333333 0.83333333]

mean value: 0.8344171819804876

key: train_precision
value: [0.83103448 0.82274247 0.85144928 0.83219178 0.82432432 0.83793103
 0.83737024 0.83505155 0.83219178 0.83737024]

mean value: 0.8341657184309065

key: test_recall
value: [0.77419355 0.90322581 0.77419355 0.90322581 0.87096774 0.90322581
 0.87096774 0.87096774 0.80645161 0.83333333]

mean value: 0.8510752688172043

key: train_recall
value: [0.86690647 0.88489209 0.84532374 0.87410072 0.87769784 0.87410072
 0.8705036  0.87410072 0.87410072 0.86738351]

mean value: 0.8709110131249839

key: test_roc_auc
value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774
 0.82258065 0.87096774 0.81989247 0.83602151]

mean value: 0.8397849462365592

key: train_roc_auc
value: [0.84532374 0.8471223  0.84892086 0.84892086 0.84532374 0.85251799
 0.85071942 0.85071942 0.84923674 0.84915938]

mean value: 0.8487964467135968

key: test_jcc
value: [0.68571429 0.73684211 0.6        0.77777778 0.81818182 0.77777778
 0.71052632 0.77142857 0.69444444 0.71428571]

mean value: 0.7286978810663021

key: train_jcc
value: [0.73700306 0.74320242 0.73667712 0.74311927 0.73939394 0.74769231
 0.74461538 0.74539877 0.74311927 0.74233129]

mean value: 0.7422552816171282

MCC on Blind test: 0.19

Accuracy on Blind test: 0.49

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.08428884 0.04923701 0.12757492 0.1029861  0.05474067 0.05481815
 0.06141877 0.06270385 0.06345892 0.05934381]

mean value: 0.07205710411071778

key: score_time
value: [0.01002645 0.00963044 0.01171899 0.01000237 0.00956392 0.00953889
 0.00953102 0.00952125 0.00951862 0.00952578]

mean value: 0.009857773780822754

key: test_mcc
value: [0.96824584 0.90369611 0.93743687 0.90748521 0.90369611 0.93743687
 1.         0.96824584 0.96770777 0.8688172 ]

mean value: 0.9362767824424617

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 0.9516129  0.96774194 0.9516129  0.9516129  0.96774194
 1.         0.98387097 0.98360656 0.93442623]

mean value: 0.9676097303014278

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98412698 0.95238095 0.96875    0.95384615 0.95238095 0.96666667
 1.         0.98360656 0.98412698 0.93333333]

mean value: 0.9679218584239075

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96875    0.9375     0.93939394 0.91176471 0.9375     1.
 1.         1.         0.96875    0.93333333]

mean value: 0.9596991978609626

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 1.         1.         0.96774194 0.93548387
 1.         0.96774194 1.         0.93333333]

mean value: 0.9772043010752688

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 0.9516129  0.96774194 0.9516129  0.9516129  0.96774194
 1.         0.98387097 0.98333333 0.9344086 ]

mean value: 0.9675806451612904

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96875    0.90909091 0.93939394 0.91176471 0.90909091 0.93548387
 1.         0.96774194 0.96875    0.875     ]

mean value: 0.9385066269909723

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.61

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01458907 0.04201937 0.02599144 0.01775765 0.04186487 0.0279336
 0.01842332 0.04155302 0.04179025 0.01767874]

mean value: 0.028960132598876955

key: score_time
value: [0.01030087 0.02038527 0.01068902 0.01067185 0.01916838 0.01074195
 0.01076746 0.01074457 0.02005053 0.010741  ]

mean value: 0.013426089286804199

key: test_mcc
value: [0.96824584 0.87278605 1.         0.90748521 0.96824584 0.96824584
 1.         0.93743687 0.87082935 0.83655914]

mean value: 0.9329834129888399

key: train_mcc
value: [0.95329292 0.9497386  0.95329292 0.96048758 0.94966486 0.94966486
 0.93900081 0.95339163 0.95693712 0.96065614]

mean value: 0.9526127442796535

key: test_accuracy
value: [0.98387097 0.93548387 1.         0.9516129  0.98387097 0.98387097
 1.         0.96774194 0.93442623 0.91803279]

mean value: 0.9658910629296669

key: train_accuracy
value: [0.97661871 0.97482014 0.97661871 0.98021583 0.97482014 0.97482014
 0.96942446 0.97661871 0.97845601 0.98025135]

mean value: 0.9762664195394134

key: test_fscore
value: [0.98360656 0.9375     1.         0.95384615 0.98412698 0.98360656
 1.         0.96666667 0.93333333 0.91803279]

mean value: 0.9660719039612482

key: train_fscore
value: [0.97674419 0.975      0.97674419 0.980322   0.97491039 0.97491039
 0.96969697 0.97682709 0.97849462 0.98046181]

mean value: 0.9764111663751257

key: test_precision
value: [1.         0.90909091 1.         0.91176471 0.96875    1.
 1.         1.         0.96551724 0.90322581]

mean value: 0.9658348662804185

key: train_precision
value: [0.97153025 0.96808511 0.97153025 0.97508897 0.97142857 0.97142857
 0.96113074 0.96819788 0.975      0.97183099]

mean value: 0.9705251323255912

key: test_recall
value: [0.96774194 0.96774194 1.         1.         1.         0.96774194
 1.         0.93548387 0.90322581 0.93333333]

mean value: 0.9675268817204301

key: train_recall
value: [0.98201439 0.98201439 0.98201439 0.98561151 0.97841727 0.97841727
 0.97841727 0.98561151 0.98201439 0.98924731]

mean value: 0.9823779685928676

key: test_roc_auc
value: [0.98387097 0.93548387 1.         0.9516129  0.98387097 0.98387097
 1.         0.96774194 0.93494624 0.91827957]

mean value: 0.9659677419354838

key: train_roc_auc
value: [0.97661871 0.97482014 0.97661871 0.98021583 0.97482014 0.97482014
 0.96942446 0.97661871 0.97846239 0.98023517]

mean value: 0.976265439261494

key: test_jcc
value: [0.96774194 0.88235294 1.         0.91176471 0.96875    0.96774194
 1.         0.93548387 0.875      0.84848485]

mean value: 0.9357320237479156

key: train_jcc
value: [0.95454545 0.95121951 0.95454545 0.96140351 0.95104895 0.95104895
 0.94117647 0.95470383 0.95789474 0.96167247]

mean value: 0.9539259346206412

MCC on Blind test: 0.17

Accuracy on Blind test: 0.38

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02236176 0.00778937 0.00771594 0.00752807 0.0074892  0.00744605
 0.00749993 0.007586   0.00749612 0.00748873]

mean value: 0.009040117263793945

key: score_time
value: [0.01843238 0.00818586 0.00802255 0.00780058 0.00774455 0.00785375
 0.00774026 0.00784397 0.00779438 0.00780678]

mean value: 0.008922505378723144

key: test_mcc
value: [0.77459667 0.65372045 0.55301004 0.74819006 0.74819006 0.7190925
 0.58338335 0.77459667 0.57576971 0.81062315]

mean value: 0.6941172654572817

key: train_mcc
value: [0.70194087 0.71536572 0.73033254 0.70140848 0.70140848 0.70194087
 0.72031981 0.70528679 0.72419371 0.70094494]

mean value: 0.7103142230314713

key: test_accuracy
value: [0.88709677 0.82258065 0.77419355 0.87096774 0.87096774 0.85483871
 0.79032258 0.88709677 0.78688525 0.90163934]

mean value: 0.8446589106292967

key: train_accuracy
value: [0.84892086 0.85611511 0.86330935 0.84892086 0.84892086 0.84892086
 0.85791367 0.85071942 0.85996409 0.8491921 ]

mean value: 0.8532897201090115

key: test_fscore
value: [0.8852459  0.8358209  0.78787879 0.87878788 0.87878788 0.86567164
 0.8        0.88888889 0.8        0.90625   ]

mean value: 0.8527331873296211

key: train_fscore
value: [0.85665529 0.86254296 0.86986301 0.85616438 0.85616438 0.85665529
 0.86541738 0.85811966 0.8668942  0.8556701 ]

mean value: 0.8604146652008448

key: test_precision
value: [0.9        0.77777778 0.74285714 0.82857143 0.82857143 0.80555556
 0.76470588 0.875      0.76470588 0.85294118]

mean value: 0.8140686274509804

key: train_precision
value: [0.81493506 0.82565789 0.83006536 0.81699346 0.81699346 0.81493506
 0.82200647 0.81758958 0.82467532 0.82178218]

mean value: 0.8205633864120958

key: test_recall
value: [0.87096774 0.90322581 0.83870968 0.93548387 0.93548387 0.93548387
 0.83870968 0.90322581 0.83870968 0.96666667]

mean value: 0.8966666666666666

key: train_recall
value: [0.9028777  0.9028777  0.91366906 0.89928058 0.89928058 0.9028777
 0.91366906 0.9028777  0.91366906 0.89247312]

mean value: 0.9043552254970217

key: test_roc_auc
value: [0.88709677 0.82258065 0.77419355 0.87096774 0.87096774 0.85483871
 0.79032258 0.88709677 0.78602151 0.90268817]

mean value: 0.8446774193548388

key: train_roc_auc
value: [0.84892086 0.85611511 0.86330935 0.84892086 0.84892086 0.84892086
 0.85791367 0.85071942 0.86006034 0.84911426]

mean value: 0.853291560300147

key: test_jcc
value: [0.79411765 0.71794872 0.65       0.78378378 0.78378378 0.76315789
 0.66666667 0.8        0.66666667 0.82857143]

mean value: 0.7454696589216713

key: train_jcc
value: [0.74925373 0.75830816 0.76969697 0.74850299 0.74850299 0.74925373
 0.76276276 0.75149701 0.76506024 0.74774775]

mean value: 0.7550586334969577

MCC on Blind test: 0.2

Accuracy on Blind test: 0.48

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01076055 0.01259518 0.01484299 0.0152657  0.01288152 0.01513839
 0.01497507 0.01241827 0.01552725 0.01354051]

mean value: 0.013794541358947754

key: score_time
value: [0.00853276 0.01013088 0.01017213 0.01044273 0.01037955 0.01046228
 0.01040554 0.01038742 0.01037264 0.01043701]

mean value: 0.010172295570373534

key: test_mcc
value: [0.93743687 0.81325006 0.84983659 0.87831007 0.93548387 0.96824584
 0.93743687 0.90748521 0.87082935 0.70997538]

mean value: 0.8808290098706804

key: train_mcc
value: [0.89396219 0.81804143 0.8410572  0.96058703 0.93914669 0.95329292
 0.9354697  0.94266562 0.95337563 0.78144333]

mean value: 0.9019041746413544

key: test_accuracy
value: [0.96774194 0.90322581 0.91935484 0.93548387 0.96774194 0.98387097
 0.96774194 0.9516129  0.93442623 0.83606557]

mean value: 0.9367265996827076

key: train_accuracy
value: [0.94604317 0.9028777  0.91546763 0.98021583 0.96942446 0.97661871
 0.9676259  0.97122302 0.97666068 0.88150808]

mean value: 0.9487665164098523

key: test_fscore
value: [0.96666667 0.89655172 0.92537313 0.93939394 0.96774194 0.98360656
 0.96666667 0.94915254 0.93333333 0.8       ]

mean value: 0.9328486499760696

key: train_fscore
value: [0.94423792 0.89370079 0.92153589 0.98039216 0.96903461 0.97674419
 0.96727273 0.97153025 0.97649186 0.86746988]

mean value: 0.9468410268529506

key: test_precision
value: [1.         0.96296296 0.86111111 0.88571429 0.96774194 1.
 1.         1.         0.96551724 1.        ]

mean value: 0.9643047536651541

key: train_precision
value: [0.97692308 0.98695652 0.85981308 0.97173145 0.98154982 0.97153025
 0.97794118 0.96126761 0.98181818 0.98630137]

mean value: 0.9655832529931669

key: test_recall
value: [0.93548387 0.83870968 1.         1.         0.96774194 0.96774194
 0.93548387 0.90322581 0.90322581 0.66666667]

mean value: 0.9118279569892473

key: train_recall
value: [0.91366906 0.81654676 0.99280576 0.98920863 0.95683453 0.98201439
 0.95683453 0.98201439 0.97122302 0.77419355]

mean value: 0.9335344627523787

key: test_roc_auc
value: [0.96774194 0.90322581 0.91935484 0.93548387 0.96774194 0.98387097
 0.96774194 0.9516129  0.93494624 0.83333333]

mean value: 0.936505376344086

key: train_roc_auc
value: [0.94604317 0.9028777  0.91546763 0.98021583 0.96942446 0.97661871
 0.9676259  0.97122302 0.97665094 0.88170109]

mean value: 0.9487848430932674

key: test_jcc
value: [0.93548387 0.8125     0.86111111 0.88571429 0.9375     0.96774194
 0.93548387 0.90322581 0.875      0.66666667]

mean value: 0.8780427547363031

key: train_jcc
value: [0.8943662  0.80782918 0.85448916 0.96153846 0.93992933 0.95454545
 0.93661972 0.94463668 0.9540636  0.76595745]

mean value: 0.9013975235029617

MCC on Blind test: 0.13

Accuracy on Blind test: 0.31

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.0138278  0.01442528 0.01401639 0.01388168 0.01454473 0.0136025
 0.0134716  0.01331687 0.01288104 0.01246238]

mean value: 0.013643026351928711

key: score_time
value: [0.01061082 0.01157665 0.01069307 0.01073432 0.01059294 0.01056838
 0.01065302 0.01038933 0.01044464 0.01039839]

mean value: 0.010666155815124511

key: test_mcc
value: [0.93743687 0.87278605 0.93743687 0.90748521 0.90369611 0.87831007
 1.         0.78446454 0.72318666 0.50305191]

mean value: 0.8447854282682599

key: train_mcc
value: [0.90882979 0.95705746 0.95025527 0.94305636 0.92239227 0.89154571
 0.94604929 0.77463214 0.83507476 0.45405525]

mean value: 0.858294830889454

key: test_accuracy
value: [0.96774194 0.93548387 0.96774194 0.9516129  0.9516129  0.93548387
 1.         0.88709677 0.85245902 0.70491803]

mean value: 0.9154151242728715

key: train_accuracy
value: [0.95323741 0.97841727 0.97482014 0.97122302 0.96043165 0.9442446
 0.97302158 0.87589928 0.91202873 0.67504488]

mean value: 0.9218368572646372

key: test_fscore
value: [0.96666667 0.9375     0.96875    0.95384615 0.95081967 0.93103448
 1.         0.89552239 0.86956522 0.57142857]

mean value: 0.9045133152282165

key: train_fscore
value: [0.95149254 0.97864769 0.97526502 0.97069597 0.95925926 0.94183865
 0.97307002 0.88924559 0.91846922 0.52493438]

mean value: 0.908291832592524

key: test_precision
value: [1.         0.90909091 0.93939394 0.91176471 0.96666667 1.
 1.         0.83333333 0.78947368 1.        ]

mean value: 0.9349723238577727

key: train_precision
value: [0.98837209 0.96830986 0.95833333 0.98880597 0.98854962 0.98431373
 0.97132616 0.80289855 0.85448916 0.98039216]

mean value: 0.9485790636020202

key: test_recall
value: [0.93548387 0.96774194 1.         1.         0.93548387 0.87096774
 1.         0.96774194 0.96774194 0.4       ]

mean value: 0.9045161290322581

key: train_recall
value: [0.91726619 0.98920863 0.99280576 0.95323741 0.93165468 0.9028777
 0.97482014 0.99640288 0.99280576 0.35842294]

mean value: 0.9009502075758747

key: test_roc_auc
value: [0.96774194 0.93548387 0.96774194 0.9516129  0.9516129  0.93548387
 1.         0.88709677 0.85053763 0.7       ]

mean value: 0.914731182795699

key: train_roc_auc
value: [0.95323741 0.97841727 0.97482014 0.97122302 0.96043165 0.9442446
 0.97302158 0.87589928 0.91217349 0.67561435]

mean value: 0.9219082798277507

key: test_jcc
value: [0.93548387 0.88235294 0.93939394 0.91176471 0.90625    0.87096774
 1.         0.81081081 0.76923077 0.4       ]

mean value: 0.8426254779397568

key: train_jcc
value: [0.90747331 0.95818815 0.95172414 0.9430605  0.92170819 0.89007092
 0.94755245 0.80057803 0.84923077 0.35587189]

mean value: 0.8525458343695811

MCC on Blind test: 0.14

Accuracy on Blind test: 0.32

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.11406326 0.10405397 0.10169864 0.10252857 0.09923482 0.10144997
 0.09957933 0.10238481 0.10498977 0.10294104]

mean value: 0.10329241752624511

key: score_time
value: [0.01416016 0.01535344 0.01559019 0.01440263 0.01463914 0.01422262
 0.01545978 0.01572537 0.01503325 0.0141983 ]

mean value: 0.014878487586975098

key: test_mcc
value: [0.96824584 0.93548387 0.96824584 0.90748521 0.90748521 0.93743687
 1.         0.90369611 1.         0.8688172 ]

mean value: 0.9396896154994742

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 0.96774194 0.98387097 0.9516129  0.9516129  0.96774194
 1.         0.9516129  1.         0.93442623]

mean value: 0.9692490745637229

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98412698 0.96774194 0.98360656 0.95384615 0.95384615 0.96666667
 1.         0.95081967 1.         0.93333333]

mean value: 0.9693987456811359

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96875    0.96774194 1.         0.91176471 0.91176471 1.
 1.         0.96666667 1.         0.93333333]

mean value: 0.9660021347248577

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 0.96774194 1.         1.         0.93548387
 1.         0.93548387 1.         0.93333333]

mean value: 0.9739784946236559

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 0.96774194 0.98387097 0.9516129  0.9516129  0.96774194
 1.         0.9516129  1.         0.9344086 ]

mean value: 0.969247311827957

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96875    0.9375     0.96774194 0.91176471 0.91176471 0.93548387
 1.         0.90625    1.         0.875     ]

mean value: 0.9414255218216319

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.31

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03851056 0.03913617 0.03798318 0.04739237 0.0397296  0.04984927
 0.04227161 0.05067539 0.05400753 0.04927731]

mean value: 0.04488329887390137

key: score_time
value: [0.02179551 0.02289391 0.02226377 0.01712132 0.03155065 0.0246129
 0.03463507 0.02148271 0.02362227 0.01659489]

mean value: 0.02365729808807373

key: test_mcc
value: [1.         0.90369611 1.         0.93743687 0.87096774 0.90748521
 0.83914639 0.96824584 0.93635873 0.90204573]

mean value: 0.9265382629263172

key: train_mcc
value: [0.99640932 0.99640932 0.99280576 0.99640932 0.98563702 0.99280576
 0.99640932 0.99640932 0.98923442 0.99284434]

mean value: 0.9935373910332435

key: test_accuracy
value: [1.         0.9516129  1.         0.96774194 0.93548387 0.9516129
 0.91935484 0.98387097 0.96721311 0.95081967]

mean value: 0.9627710206240084

key: train_accuracy
value: [0.99820144 0.99820144 0.99640288 0.99820144 0.99280576 0.99640288
 0.99820144 0.99820144 0.994614   0.99640934]

mean value: 0.9967642044353745

key: test_fscore
value: [1.         0.95081967 1.         0.96875    0.93548387 0.94915254
 0.91803279 0.98360656 0.96875    0.94915254]

mean value: 0.9623747972106947

key: train_fscore
value: [0.9981982  0.9981982  0.99640288 0.9981982  0.99277978 0.99640288
 0.99820467 0.9981982  0.994614   0.99640288]

mean value: 0.9967599880734039

key: test_precision
value: [1.         0.96666667 1.         0.93939394 0.93548387 1.
 0.93333333 1.         0.93939394 0.96551724]

mean value: 0.9679788991134931

key: train_precision
value: [1.         1.         0.99640288 1.         0.99637681 0.99640288
 0.99641577 1.         0.99283154 1.        ]

mean value: 0.9978429878817844

key: test_recall
value: [1.         0.93548387 1.         1.         0.93548387 0.90322581
 0.90322581 0.96774194 1.         0.93333333]

mean value: 0.9578494623655914

key: train_recall
value: [0.99640288 0.99640288 0.99640288 0.99640288 0.98920863 0.99640288
 1.         0.99640288 0.99640288 0.99283154]

mean value: 0.9956860318197055

key: test_roc_auc
value: [1.         0.9516129  1.         0.96774194 0.93548387 0.9516129
 0.91935484 0.98387097 0.96666667 0.95053763]

mean value: 0.9626881720430108

key: train_roc_auc
value: [0.99820144 0.99820144 0.99640288 0.99820144 0.99280576 0.99640288
 0.99820144 0.99820144 0.99461721 0.99641577]

mean value: 0.996765168510353

key: test_jcc
value: [1.         0.90625    1.         0.93939394 0.87878788 0.90322581
 0.84848485 0.96774194 0.93939394 0.90322581]

mean value: 0.9286504154447703

key: train_jcc
value: [0.99640288 0.99640288 0.99283154 0.99640288 0.98566308 0.99283154
 0.99641577 0.99640288 0.98928571 0.99283154]

mean value: 0.9935470701779591

MCC on Blind test: 0.07

Accuracy on Blind test: 0.58

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.12759042 0.22521901 0.21887374 0.2211132  0.17997479 0.20122313
 0.19672465 0.20488429 0.276335   0.25733685]

mean value: 0.21092751026153564

key: score_time
value: [0.01269174 0.02497721 0.02092695 0.02029276 0.01257658 0.0126636
 0.01265192 0.02021074 0.02772164 0.02012014]

mean value: 0.0184833288192749

key: test_mcc
value: [0.90748521 0.61807005 0.7130241  0.80813523 0.77784447 0.77459667
 0.61807005 0.80645161 0.57576971 0.70780713]

mean value: 0.7307254226729265

key: train_mcc
value: [0.87086426 0.86386843 0.84312418 0.83904739 0.85318614 0.85376169
 0.85720277 0.84009387 0.86412027 0.86022912]

mean value: 0.8545498119930119

key: test_accuracy
value: [0.9516129  0.80645161 0.85483871 0.90322581 0.88709677 0.88709677
 0.80645161 0.90322581 0.78688525 0.85245902]

mean value: 0.8639344262295082

key: train_accuracy
value: [0.9352518  0.93165468 0.92086331 0.91906475 0.92625899 0.92625899
 0.92805755 0.91906475 0.93177738 0.92998205]

mean value: 0.9268234245637601

key: test_fscore
value: [0.94915254 0.81818182 0.86153846 0.90625    0.89230769 0.88888889
 0.81818182 0.90322581 0.8        0.84210526]

mean value: 0.8679832291081068

key: train_fscore
value: [0.93617021 0.93286219 0.92307692 0.92091388 0.92768959 0.92819615
 0.92982456 0.92173913 0.93286219 0.93097345]

mean value: 0.9284308286107671

key: test_precision
value: [1.         0.77142857 0.82352941 0.87878788 0.85294118 0.875
 0.77142857 0.90322581 0.76470588 0.88888889]

mean value: 0.8529936187573759

key: train_precision
value: [0.92307692 0.91666667 0.89795918 0.90034364 0.9100346  0.90443686
 0.90753425 0.89225589 0.91666667 0.91958042]

mean value: 0.9088555103251448

key: test_recall
value: [0.90322581 0.87096774 0.90322581 0.93548387 0.93548387 0.90322581
 0.87096774 0.90322581 0.83870968 0.8       ]

mean value: 0.8864516129032258

key: train_recall
value: [0.94964029 0.94964029 0.94964029 0.94244604 0.94604317 0.95323741
 0.95323741 0.95323741 0.94964029 0.94265233]

mean value: 0.9489414919677162

key: test_roc_auc
value: [0.9516129  0.80645161 0.85483871 0.90322581 0.88709677 0.88709677
 0.80645161 0.90322581 0.78602151 0.8516129 ]

mean value: 0.8637634408602151

key: train_roc_auc
value: [0.9352518  0.93165468 0.92086331 0.91906475 0.92625899 0.92625899
 0.92805755 0.91906475 0.93180939 0.92995926]

mean value: 0.9268243469740336

key: test_jcc
value: [0.90322581 0.69230769 0.75675676 0.82857143 0.80555556 0.8
 0.69230769 0.82352941 0.66666667 0.72727273]

mean value: 0.7696193737654838

key: train_jcc
value: [0.88       0.87417219 0.85714286 0.8534202  0.86513158 0.86601307
 0.86885246 0.85483871 0.87417219 0.87086093]

mean value: 0.8664604170132448

MCC on Blind test: 0.22

Accuracy on Blind test: 0.49

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.26798725 0.26752782 0.26489639 0.26649332 0.25984526 0.26162148
 0.2612102  0.26817322 0.26268578 0.26635146]

mean value: 0.26467921733856203

key: score_time
value: [0.00845337 0.00842595 0.00839472 0.0083878  0.00851393 0.00833416
 0.00913382 0.00835061 0.00875974 0.00896358]

mean value: 0.008571767807006836

key: test_mcc
value: [1.         0.90369611 1.         0.93743687 0.93743687 0.90748521
 0.96824584 0.96824584 0.96770777 0.8688172 ]

mean value: 0.9459071710309553

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9516129  1.         0.96774194 0.96774194 0.9516129
 0.98387097 0.98387097 0.98360656 0.93442623]

mean value: 0.9724484399788472

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95081967 1.         0.96875    0.96875    0.94915254
 0.98412698 0.98360656 0.98412698 0.93333333]

mean value: 0.972266607346838

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.96666667 1.         0.93939394 0.93939394 1.
 0.96875    1.         0.96875    0.93333333]

mean value: 0.9716287878787879

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.93548387 1.         1.         1.         0.90322581
 1.         0.96774194 1.         0.93333333]

mean value: 0.9739784946236559

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9516129  1.         0.96774194 0.96774194 0.9516129
 0.98387097 0.98387097 0.98333333 0.9344086 ]

mean value: 0.9724193548387097

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90625    1.         0.93939394 0.93939394 0.90322581
 0.96875    0.96774194 0.96875    0.875     ]

mean value: 0.9468505620723363

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.14

Accuracy on Blind test: 0.63

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01149559 0.01360273 0.01408195 0.01396298 0.0143764  0.01411986
 0.01366615 0.01368761 0.01428699 0.01439714]

mean value: 0.013767743110656738

key: score_time
value: [0.01090598 0.01095629 0.01090288 0.01166439 0.01109648 0.01160717
 0.01097107 0.01158404 0.01094365 0.01160645]

mean value: 0.011223840713500976

key: test_mcc
value: [0.3799803  0.51119863 0.54006172 0.74161985 0.56853524 0.56493268
 0.50083542 0.43852901 0.72318666 0.76533557]

mean value: 0.5734215093600435

key: train_mcc
value: [0.4932785  0.76196204 0.69278522 0.72409686 0.56120987 0.54686874
 0.76885315 0.49611447 0.76738608 0.73356387]

mean value: 0.6546118797369623

key: test_accuracy
value: [0.64516129 0.74193548 0.72580645 0.85483871 0.75806452 0.74193548
 0.74193548 0.66129032 0.85245902 0.86885246]

mean value: 0.759227921734532

key: train_accuracy
value: [0.69784173 0.87230216 0.82553957 0.8471223  0.75359712 0.73021583
 0.87410072 0.69964029 0.87791741 0.85098743]

mean value: 0.8029264559626984

key: test_fscore
value: [0.73170732 0.77777778 0.78481013 0.87323944 0.69387755 0.79487179
 0.77142857 0.74698795 0.86956522 0.88235294]

mean value: 0.7926618685748723

key: train_fscore
value: [0.76731302 0.88455285 0.85099846 0.86614173 0.68649886 0.78753541
 0.88709677 0.76837725 0.88741722 0.87010955]

mean value: 0.8256041120420929

key: test_precision
value: [0.58823529 0.68292683 0.64583333 0.775      0.94444444 0.65957447
 0.69230769 0.59615385 0.78947368 0.78947368]

mean value: 0.7163423276131415

key: train_precision
value: [0.62387387 0.80712166 0.74262735 0.77030812 0.94339623 0.64953271
 0.80409357 0.62528217 0.82208589 0.77222222]

mean value: 0.756054378747134

key: test_recall
value: [0.96774194 0.90322581 1.         1.         0.5483871  1.
 0.87096774 1.         0.96774194 1.        ]

mean value: 0.9258064516129032

key: train_recall
value: [0.99640288 0.97841727 0.99640288 0.98920863 0.53956835 1.
 0.98920863 0.99640288 0.96402878 0.99641577]

mean value: 0.9446056058379103

key: test_roc_auc
value: [0.64516129 0.74193548 0.72580645 0.85483871 0.75806452 0.74193548
 0.74193548 0.66129032 0.85053763 0.87096774]

mean value: 0.759247311827957

key: train_roc_auc
value: [0.69784173 0.87230216 0.82553957 0.8471223  0.75359712 0.73021583
 0.87410072 0.69964029 0.87807174 0.85072587]

mean value: 0.8029157319305846

key: test_jcc
value: [0.57692308 0.63636364 0.64583333 0.775      0.53125    0.65957447
 0.62790698 0.59615385 0.76923077 0.78947368]

mean value: 0.660770979104448

key: train_jcc
value: [0.62247191 0.79300292 0.74064171 0.76388889 0.52264808 0.64953271
 0.79710145 0.62387387 0.79761905 0.7700831 ]

mean value: 0.7080863692848516

MCC on Blind test: 0.18

Accuracy on Blind test: 0.6

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.02095389 0.03017879 0.03077292 0.03034782 0.03044152 0.03048611
 0.03019238 0.03031898 0.0302968  0.03035188]

mean value: 0.02943410873413086

key: score_time
value: [0.0190351  0.02024627 0.02113628 0.01070428 0.01898575 0.01937699
 0.02058935 0.01084757 0.0107224  0.01985407]

mean value: 0.017149806022644043

key: test_mcc
value: [0.96824584 0.81325006 0.83914639 0.87831007 0.96824584 0.93548387
 0.90369611 0.93743687 0.80516731 0.8688172 ]

mean value: 0.8917799559713326

key: train_mcc
value: [0.93900081 0.93890359 0.91007783 0.9352518  0.92088714 0.92808157
 0.91007783 0.92805755 0.92820949 0.93182991]

mean value: 0.9270377524969889

key: test_accuracy
value: [0.98387097 0.90322581 0.91935484 0.93548387 0.98387097 0.96774194
 0.9516129  0.96774194 0.90163934 0.93442623]

mean value: 0.9448968799576943

key: train_accuracy
value: [0.96942446 0.96942446 0.95503597 0.9676259  0.96043165 0.96402878
 0.95503597 0.96402878 0.96409336 0.96588869]

mean value: 0.9635018017901656

key: test_fscore
value: [0.98412698 0.90909091 0.92063492 0.93939394 0.98412698 0.96774194
 0.95081967 0.96666667 0.9        0.93333333]

mean value: 0.9455935344988755

key: train_fscore
value: [0.96969697 0.96958855 0.95495495 0.9676259  0.96057348 0.96415771
 0.95495495 0.96402878 0.96389892 0.96613191]

mean value: 0.9635612113921358

key: test_precision
value: [0.96875    0.85714286 0.90625    0.88571429 0.96875    0.96774194
 0.96666667 1.         0.93103448 0.93333333]

mean value: 0.9385383561099634

key: train_precision
value: [0.96113074 0.96441281 0.9566787  0.9676259  0.95714286 0.96071429
 0.9566787  0.96402878 0.9673913  0.96099291]

mean value: 0.9616796985424773

key: test_recall
value: [1.         0.96774194 0.93548387 1.         1.         0.96774194
 0.93548387 0.93548387 0.87096774 0.93333333]

mean value: 0.9546236559139785

key: train_recall
value: [0.97841727 0.97482014 0.95323741 0.9676259  0.96402878 0.9676259
 0.95323741 0.96402878 0.96043165 0.97132616]

mean value: 0.9654779402284623

key: test_roc_auc
value: [0.98387097 0.90322581 0.91935484 0.93548387 0.98387097 0.96774194
 0.9516129  0.96774194 0.90215054 0.9344086 ]

mean value: 0.9449462365591399

key: train_roc_auc
value: [0.96942446 0.96942446 0.95503597 0.9676259  0.96043165 0.96402878
 0.95503597 0.96402878 0.9640868  0.96587891]

mean value: 0.9635001676078492

key: test_jcc
value: [0.96875    0.83333333 0.85294118 0.88571429 0.96875    0.9375
 0.90625    0.93548387 0.81818182 0.875     ]

mean value: 0.8981904484667768

key: train_jcc
value: [0.94117647 0.94097222 0.9137931  0.93728223 0.92413793 0.93079585
 0.9137931  0.93055556 0.93031359 0.93448276]

mean value: 0.9297302811483933

MCC on Blind test: 0.23

Accuracy on Blind test: 0.47

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.20507646 0.19756031 0.19703841 0.19701099 0.20378065 0.19817948
 0.19686937 0.19685602 0.19804025 0.20452499]

mean value: 0.1994936943054199

key: score_time
value: [0.01948881 0.02093601 0.01906753 0.02151203 0.02097845 0.01082182
 0.02040362 0.01091933 0.02004528 0.01085353]

mean value: 0.017502641677856444

key: test_mcc
value: [0.96824584 0.84266484 0.90369611 0.90748521 0.96824584 0.96824584
 0.96824584 0.93743687 0.87082935 0.83655914]

mean value: 0.9171654872995563

key: train_mcc
value: [0.94254361 0.94619622 0.94609826 0.94966486 0.94609826 0.94966486
 0.93890359 0.95339163 0.95691189 0.9534734 ]

mean value: 0.948294657254694

key: test_accuracy
value: [0.98387097 0.91935484 0.9516129  0.9516129  0.98387097 0.98387097
 0.98387097 0.96774194 0.93442623 0.91803279]

mean value: 0.9578265468006346

key: train_accuracy
value: [0.97122302 0.97302158 0.97302158 0.97482014 0.97302158 0.97482014
 0.96942446 0.97661871 0.97845601 0.97666068]

mean value: 0.9741087919610452

key: test_fscore
value: [0.98412698 0.92307692 0.95238095 0.95384615 0.98412698 0.98360656
 0.98412698 0.96666667 0.93333333 0.91803279]

mean value: 0.9583324325947277

key: train_fscore
value: [0.97142857 0.97326203 0.97316637 0.97491039 0.97316637 0.97491039
 0.96958855 0.97682709 0.97841727 0.97690941]

mean value: 0.9742586454574466

key: test_precision
value: [0.96875    0.88235294 0.9375     0.91176471 0.96875    1.
 0.96875    1.         0.96551724 0.90322581]

mean value: 0.9506610694889747

key: train_precision
value: [0.96453901 0.96466431 0.96797153 0.97142857 0.96797153 0.97142857
 0.96441281 0.96819788 0.97841727 0.96830986]

mean value: 0.9687341337990163

key: test_recall
value: [1.         0.96774194 0.96774194 1.         1.         0.96774194
 1.         0.93548387 0.90322581 0.93333333]

mean value: 0.9675268817204301

key: train_recall
value: [0.97841727 0.98201439 0.97841727 0.97841727 0.97841727 0.97841727
 0.97482014 0.98561151 0.97841727 0.98566308]

mean value: 0.9798612722725046

key: test_roc_auc
value: [0.98387097 0.91935484 0.9516129  0.9516129  0.98387097 0.98387097
 0.98387097 0.96774194 0.93494624 0.91827957]

mean value: 0.9579032258064516

key: train_roc_auc
value: [0.97122302 0.97302158 0.97302158 0.97482014 0.97302158 0.97482014
 0.96942446 0.97661871 0.97845594 0.97664449]

mean value: 0.9741071658801991

key: test_jcc
value: [0.96875    0.85714286 0.90909091 0.91176471 0.96875    0.96774194
 0.96875    0.93548387 0.875      0.84848485]

mean value: 0.921095912705258

key: train_jcc
value: [0.94444444 0.94791667 0.94773519 0.95104895 0.94773519 0.95104895
 0.94097222 0.95470383 0.95774648 0.95486111]

mean value: 0.9498213041443461

MCC on Blind test: 0.2

Accuracy on Blind test: 0.44

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.04743552 0.02349114 0.02685213 0.02750683 0.0252192  0.0279355
 0.03556037 0.04037642 0.03911209 0.03533268]

mean value: 0.032882189750671385

key: score_time
value: [0.01078486 0.01099777 0.01306605 0.01077437 0.01067662 0.01066399
 0.01071048 0.01073122 0.01087689 0.01084948]

mean value: 0.011013174057006836

key: test_mcc
value: [0.96824584 0.7130241  0.83914639 0.90748521 0.79471941 0.93548387
 0.71004695 0.80813523 0.77096774 0.87082935]

mean value: 0.8318084093587729

key: train_mcc
value: [0.87424213 0.85278837 0.83904739 0.84537297 0.85265591 0.84192273
 0.83904739 0.85646981 0.84627216 0.84586123]

mean value: 0.8493680080976538

key: test_accuracy
value: [0.98387097 0.85483871 0.91935484 0.9516129  0.88709677 0.96774194
 0.85483871 0.90322581 0.8852459  0.93442623]

mean value: 0.9142252776308831

key: train_accuracy
value: [0.93705036 0.92625899 0.91906475 0.92266187 0.92625899 0.92086331
 0.91906475 0.92805755 0.92280072 0.92280072]

mean value: 0.9244882011805278

key: test_fscore
value: [0.98412698 0.86153846 0.92063492 0.95384615 0.89855072 0.96774194
 0.85714286 0.9        0.8852459  0.93548387]

mean value: 0.9164311810018015

key: train_fscore
value: [0.93761141 0.92717584 0.92091388 0.92307692 0.92691622 0.92170819
 0.92091388 0.92907801 0.92416226 0.92389381]

mean value: 0.9255450426062092

key: test_precision
value: [0.96875    0.82352941 0.90625    0.91176471 0.81578947 0.96774194
 0.84375    0.93103448 0.9        0.90625   ]

mean value: 0.8974860009573761

key: train_precision
value: [0.92932862 0.91578947 0.90034364 0.91814947 0.91872792 0.91197183
 0.90034364 0.91608392 0.90657439 0.91258741]

mean value: 0.9129900316323134

key: test_recall
value: [1.         0.90322581 0.93548387 1.         1.         0.96774194
 0.87096774 0.87096774 0.87096774 0.96666667]

mean value: 0.9386021505376344

key: train_recall
value: [0.94604317 0.93884892 0.94244604 0.92805755 0.9352518  0.93165468
 0.94244604 0.94244604 0.94244604 0.93548387]

mean value: 0.9385124158737526

key: test_roc_auc
value: [0.98387097 0.85483871 0.91935484 0.9516129  0.88709677 0.96774194
 0.85483871 0.90322581 0.88548387 0.93494624]

mean value: 0.9143010752688172

key: train_roc_auc
value: [0.93705036 0.92625899 0.91906475 0.92266187 0.92625899 0.92086331
 0.91906475 0.92805755 0.92283592 0.92277791]

mean value: 0.9244894407055001

key: test_jcc
value: [0.96875    0.75675676 0.85294118 0.91176471 0.81578947 0.9375
 0.75       0.81818182 0.79411765 0.87878788]

mean value: 0.8484589456822429

key: train_jcc
value: [0.88255034 0.86423841 0.8534202  0.85714286 0.86378738 0.85478548
 0.8534202  0.86754967 0.85901639 0.85855263]

mean value: 0.8614463542047712

MCC on Blind test: 0.21

Accuracy on Blind test: 0.53

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.72030234 0.71688986 0.82023787 0.6914432  0.76275826 0.84668159
 0.7217288  0.70235157 0.860708   0.69467163]

mean value: 0.7537773132324219

key: score_time
value: [0.01084113 0.01207328 0.01228642 0.01954889 0.0122242  0.01225781
 0.01232028 0.0122869  0.01229262 0.01234746]

mean value: 0.012847900390625

key: test_mcc
value: [0.93743687 0.90369611 1.         0.90369611 0.87096774 0.93548387
 0.90369611 0.87278605 0.93649139 0.87082935]

mean value: 0.9135083615653431

key: train_mcc
value: [0.95329292 0.94634322 0.95685929 0.95685929 0.97482645 0.96043787
 0.93195016 0.96048758 0.96050901 0.98205307]

mean value: 0.9583618868215811

key: test_accuracy
value: [0.96774194 0.9516129  1.         0.9516129  0.93548387 0.96774194
 0.9516129  0.93548387 0.96721311 0.93442623]

mean value: 0.9562929666842941

key: train_accuracy
value: [0.97661871 0.97302158 0.97841727 0.97841727 0.98741007 0.98021583
 0.96582734 0.98021583 0.98025135 0.99102334]

mean value: 0.9791418570708963

key: test_fscore
value: [0.96875    0.95238095 1.         0.95238095 0.93548387 0.96774194
 0.95238095 0.93333333 0.96666667 0.93548387]

mean value: 0.9564602534562212

key: train_fscore
value: [0.97674419 0.97335702 0.97833935 0.97849462 0.98738739 0.98025135
 0.96625222 0.980322   0.98025135 0.99102334]

mean value: 0.9792422819398573

key: test_precision
value: [0.93939394 0.9375     1.         0.9375     0.93548387 0.96774194
 0.9375     0.96551724 1.         0.90625   ]

mean value: 0.9526886987224863

key: train_precision
value: [0.97153025 0.96140351 0.98188406 0.975      0.98916968 0.97849462
 0.95438596 0.97508897 0.97849462 0.99280576]

mean value: 0.975825742653484

key: test_recall
value: [1.         0.96774194 1.         0.96774194 0.93548387 0.96774194
 0.96774194 0.90322581 0.93548387 0.96666667]

mean value: 0.9611827956989247

key: train_recall
value: [0.98201439 0.98561151 0.97482014 0.98201439 0.98561151 0.98201439
 0.97841727 0.98561151 0.98201439 0.98924731]

mean value: 0.9827376808230834

key: test_roc_auc
value: [0.96774194 0.9516129  1.         0.9516129  0.93548387 0.96774194
 0.9516129  0.93548387 0.96774194 0.93494624]

mean value: 0.9563978494623656

key: train_roc_auc
value: [0.97661871 0.97302158 0.97841727 0.97841727 0.98741007 0.98021583
 0.96582734 0.98021583 0.98025451 0.99102653]

mean value: 0.9791424924576468

key: test_jcc
value: [0.93939394 0.90909091 1.         0.90909091 0.87878788 0.9375
 0.90909091 0.875      0.93548387 0.87878788]

mean value: 0.9172226295210166

key: train_jcc
value: [0.95454545 0.94809689 0.95759717 0.95789474 0.97508897 0.96126761
 0.9347079  0.96140351 0.96126761 0.98220641]

mean value: 0.959407624783067

MCC on Blind test: 0.14

Accuracy on Blind test: 0.35

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01091671 0.01010871 0.00867534 0.00845599 0.0082767  0.00828934
 0.00830674 0.00770044 0.00745106 0.00766039]

mean value: 0.008584141731262207

key: score_time
value: [0.01369882 0.00904274 0.0089016  0.00860572 0.00860953 0.00857306
 0.00860238 0.00795913 0.00795174 0.00807285]

mean value: 0.009001755714416504

key: test_mcc
value: [0.78446454 0.51856298 0.71004695 0.84266484 0.7190925  0.67883359
 0.51639778 0.84266484 0.67204301 0.73763441]

mean value: 0.7022405434817621

key: train_mcc
value: [0.70405758 0.72340077 0.71605437 0.70505422 0.73033396 0.71230395
 0.70505422 0.70180672 0.72391206 0.73070576]

mean value: 0.7152683609552583

key: test_accuracy
value: [0.88709677 0.75806452 0.85483871 0.91935484 0.85483871 0.83870968
 0.75806452 0.91935484 0.83606557 0.86885246]

mean value: 0.8495240613432047

key: train_accuracy
value: [0.84532374 0.86151079 0.85791367 0.85251799 0.86510791 0.85611511
 0.85251799 0.85071942 0.86175943 0.86535009]

mean value: 0.8568836133965358

key: test_fscore
value: [0.89552239 0.76923077 0.85714286 0.92307692 0.86567164 0.84375
 0.75409836 0.91525424 0.83870968 0.86666667]

mean value: 0.852912352133119

key: train_fscore
value: [0.85901639 0.86371681 0.85968028 0.85304659 0.86631016 0.85714286
 0.85304659 0.85309735 0.86371681 0.86535009]

mean value: 0.8594123948387209

key: test_precision
value: [0.83333333 0.73529412 0.84375    0.88235294 0.80555556 0.81818182
 0.76666667 0.96428571 0.83870968 0.86666667]

mean value: 0.8354796490932639

key: train_precision
value: [0.78915663 0.85017422 0.84912281 0.85       0.85865724 0.85106383
 0.85       0.83972125 0.85017422 0.86690647]

mean value: 0.845497666835835

key: test_recall
value: [0.96774194 0.80645161 0.87096774 0.96774194 0.93548387 0.87096774
 0.74193548 0.87096774 0.83870968 0.86666667]

mean value: 0.8737634408602151

key: train_recall
value: [0.94244604 0.87769784 0.8705036  0.85611511 0.87410072 0.86330935
 0.85611511 0.86690647 0.87769784 0.86379928]

mean value: 0.8748691369485058

key: test_roc_auc
value: [0.88709677 0.75806452 0.85483871 0.91935484 0.85483871 0.83870968
 0.75806452 0.91935484 0.83602151 0.8688172 ]

mean value: 0.8495161290322581

key: train_roc_auc
value: [0.84532374 0.86151079 0.85791367 0.85251799 0.86510791 0.85611511
 0.85251799 0.85071942 0.86178799 0.86535288]

mean value: 0.8568867486655837

key: test_jcc
value: [0.81081081 0.625      0.75       0.85714286 0.76315789 0.72972973
 0.60526316 0.84375    0.72222222 0.76470588]

mean value: 0.747178255489014

key: train_jcc
value: [0.75287356 0.76012461 0.75389408 0.74375    0.76415094 0.75
 0.74375    0.74382716 0.76012461 0.76265823]

mean value: 0.7535153197137231

MCC on Blind test: 0.21

Accuracy on Blind test: 0.57

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00817347 0.00792408 0.00842571 0.00832367 0.00831985 0.00922894
 0.00865865 0.00853586 0.00858712 0.00871468]

mean value: 0.008489203453063966

key: score_time
value: [0.0080893  0.00804472 0.00845718 0.00870252 0.00849962 0.00911498
 0.00868034 0.00858021 0.00858331 0.00869846]

mean value: 0.00854506492614746

key: test_mcc
value: [0.61807005 0.65372045 0.45374261 0.71004695 0.51856298 0.71004695
 0.42023032 0.74193548 0.54251915 0.57419355]

mean value: 0.5943068479116385

key: train_mcc
value: [0.61176415 0.63718965 0.62604511 0.60075441 0.62596408 0.60075441
 0.65528703 0.62262853 0.64839945 0.64106733]

mean value: 0.6269854139141487

key: test_accuracy
value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871
 0.70967742 0.87096774 0.7704918  0.78688525]

mean value: 0.7960602855631941

key: train_accuracy
value: [0.8057554  0.81834532 0.81294964 0.80035971 0.81294964 0.80035971
 0.82733813 0.81115108 0.82405745 0.82046679]

mean value: 0.8133732870077367

key: test_fscore
value: [0.79310345 0.8358209  0.71186441 0.85245902 0.76923077 0.85245902
 0.71875    0.87096774 0.76666667 0.78688525]

mean value: 0.7958207207099356

key: train_fscore
value: [0.80851064 0.82186949 0.81090909 0.79927667 0.8115942  0.79927667
 0.83098592 0.81415929 0.82624113 0.82269504]

mean value: 0.814551814377158

key: test_precision
value: [0.85185185 0.77777778 0.75       0.86666667 0.73529412 0.86666667
 0.6969697  0.87096774 0.79310345 0.77419355]

mean value: 0.7983491516178162

key: train_precision
value: [0.7972028  0.80622837 0.81985294 0.80363636 0.81751825 0.80363636
 0.8137931  0.80139373 0.81468531 0.81403509]

mean value: 0.8091982321605484

key: test_recall
value: [0.74193548 0.90322581 0.67741935 0.83870968 0.80645161 0.83870968
 0.74193548 0.87096774 0.74193548 0.8       ]

mean value: 0.7961290322580645

key: train_recall
value: [0.82014388 0.8381295  0.80215827 0.79496403 0.8057554  0.79496403
 0.84892086 0.82733813 0.8381295  0.83154122]

mean value: 0.8202044815760295

key: test_roc_auc
value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871
 0.70967742 0.87096774 0.77096774 0.78709677]

mean value: 0.7961290322580645

key: train_roc_auc
value: [0.8057554  0.81834532 0.81294964 0.80035971 0.81294964 0.80035971
 0.82733813 0.81115108 0.82408267 0.82044687]

mean value: 0.8133738170753719

key: test_jcc
value: [0.65714286 0.71794872 0.55263158 0.74285714 0.625      0.74285714
 0.56097561 0.77142857 0.62162162 0.64864865]

mean value: 0.6641111891208169

key: train_jcc
value: [0.67857143 0.69760479 0.68195719 0.66566265 0.68292683 0.66566265
 0.71084337 0.68656716 0.70392749 0.69879518]

mean value: 0.6872518746851146

MCC on Blind test: 0.18

Accuracy on Blind test: 0.52

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00818872 0.00765157 0.00800991 0.00798917 0.00798535 0.00793529
 0.00807238 0.00833344 0.00816226 0.00822878]

mean value: 0.008055686950683594

key: score_time
value: [0.01362538 0.01161528 0.01154208 0.01181722 0.01149821 0.01183558
 0.01181483 0.01180124 0.01576948 0.01188374]

mean value: 0.012320303916931152

key: test_mcc
value: [0.7130241  0.61418277 0.5483871  0.77459667 0.51856298 0.74348441
 0.58834841 0.61807005 0.60818119 0.57576971]

mean value: 0.6302607385125394

key: train_mcc
value: [0.7014797  0.74464768 0.73388892 0.71949894 0.75180343 0.71341277
 0.73033396 0.70918848 0.73474672 0.73420349]

mean value: 0.7273204091578028

key: test_accuracy
value: [0.85483871 0.80645161 0.77419355 0.88709677 0.75806452 0.87096774
 0.79032258 0.80645161 0.80327869 0.78688525]

mean value: 0.8138551031200423

key: train_accuracy
value: [0.85071942 0.87230216 0.86690647 0.85971223 0.87589928 0.85611511
 0.86510791 0.85431655 0.86714542 0.86535009]

mean value: 0.8633574648360306

key: test_fscore
value: [0.84745763 0.8125     0.77419355 0.8852459  0.76923077 0.86666667
 0.80597015 0.79310345 0.8        0.77192982]

mean value: 0.8126297935133517

key: train_fscore
value: [0.84990958 0.8716094  0.86594203 0.85869565 0.87567568 0.85185185
 0.86388385 0.85137615 0.86446886 0.85875706]

mean value: 0.8612170116983378

key: test_precision
value: [0.89285714 0.78787879 0.77419355 0.9        0.73529412 0.89655172
 0.75       0.85185185 0.82758621 0.81481481]

mean value: 0.8231028194471236

key: train_precision
value: [0.85454545 0.87636364 0.87226277 0.8649635  0.87725632 0.8778626
 0.87179487 0.86891386 0.88059701 0.9047619 ]

mean value: 0.8749321930550784

key: test_recall
value: [0.80645161 0.83870968 0.77419355 0.87096774 0.80645161 0.83870968
 0.87096774 0.74193548 0.77419355 0.73333333]

mean value: 0.8055913978494623

key: train_recall
value: [0.84532374 0.86690647 0.85971223 0.85251799 0.87410072 0.82733813
 0.85611511 0.83453237 0.84892086 0.8172043 ]

mean value: 0.848267192697455

key: test_roc_auc
value: [0.85483871 0.80645161 0.77419355 0.88709677 0.75806452 0.87096774
 0.79032258 0.80645161 0.80376344 0.78602151]

mean value: 0.8138172043010753

key: train_roc_auc
value: [0.85071942 0.87230216 0.86690647 0.85971223 0.87589928 0.85611511
 0.86510791 0.85431655 0.86711276 0.86543668]

mean value: 0.8633628581006163

key: test_jcc
value: [0.73529412 0.68421053 0.63157895 0.79411765 0.625      0.76470588
 0.675      0.65714286 0.66666667 0.62857143]

mean value: 0.6862288073123987

key: train_jcc
value: [0.73899371 0.7724359  0.76357827 0.75238095 0.77884615 0.74193548
 0.76038339 0.74121406 0.76129032 0.75247525]

mean value: 0.7563533487181033

MCC on Blind test: 0.16

Accuracy on Blind test: 0.57

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01835275 0.01664305 0.01745725 0.01704788 0.01895404 0.01934195
 0.01825833 0.01938081 0.01911664 0.01921248]

mean value: 0.0183765172958374

key: score_time
value: [0.00940108 0.01006126 0.00929928 0.00980639 0.01032066 0.01062155
 0.01041937 0.01055121 0.01047111 0.01046944]

mean value: 0.010142135620117187

key: test_mcc
value: [0.96824584 0.66226618 0.62471615 0.7190925  0.7284928  0.80813523
 0.50083542 0.80645161 0.63939757 0.81978229]

mean value: 0.7277415590359753

key: train_mcc
value: [0.85345163 0.77632088 0.79541168 0.777078   0.76906554 0.75930753
 0.79995316 0.76580581 0.77932355 0.78519796]

mean value: 0.78609157351081

key: test_accuracy
value: [0.98387097 0.82258065 0.80645161 0.85483871 0.85483871 0.90322581
 0.74193548 0.90322581 0.81967213 0.90163934]

mean value: 0.859227921734532

key: train_accuracy
value: [0.92625899 0.88489209 0.89568345 0.88489209 0.88129496 0.87589928
 0.89748201 0.8794964  0.88689408 0.89048474]

mean value: 0.890327809565633

key: test_fscore
value: [0.98412698 0.84057971 0.82352941 0.86567164 0.86956522 0.90625
 0.77142857 0.90322581 0.82539683 0.90909091]

mean value: 0.8698865077586886

key: train_fscore
value: [0.92794376 0.89189189 0.90068493 0.89225589 0.88851351 0.88403361
 0.90289608 0.88701518 0.89303905 0.89608177]

mean value: 0.8964355683391803

key: test_precision
value: [0.96875    0.76315789 0.75675676 0.80555556 0.78947368 0.87878788
 0.69230769 0.90322581 0.8125     0.83333333]

mean value: 0.8203848602140198

key: train_precision
value: [0.90721649 0.84076433 0.85947712 0.83860759 0.83757962 0.829653
 0.85760518 0.83492063 0.84565916 0.8538961 ]

mean value: 0.8505379240652493

key: test_recall
value: [1.         0.93548387 0.90322581 0.93548387 0.96774194 0.93548387
 0.87096774 0.90322581 0.83870968 1.        ]

mean value: 0.9290322580645161

key: train_recall
value: [0.94964029 0.94964029 0.94604317 0.95323741 0.94604317 0.94604317
 0.95323741 0.94604317 0.94604317 0.94265233]

mean value: 0.9478623552770686

key: test_roc_auc
value: [0.98387097 0.82258065 0.80645161 0.85483871 0.85483871 0.90322581
 0.74193548 0.90322581 0.81935484 0.90322581]

mean value: 0.8593548387096774

key: train_roc_auc
value: [0.92625899 0.88489209 0.89568345 0.88489209 0.88129496 0.87589928
 0.89748201 0.8794964  0.88700008 0.89039091]

mean value: 0.8903290271008999

key: test_jcc
value: [0.96875    0.725      0.7        0.76315789 0.76923077 0.82857143
 0.62790698 0.82352941 0.7027027  0.83333333]

mean value: 0.7742182517083968

key: train_jcc
value: [0.86557377 0.80487805 0.81931464 0.80547112 0.7993921  0.79216867
 0.82298137 0.7969697  0.80674847 0.8117284 ]

mean value: 0.8125226282348854

MCC on Blind test: 0.26

Accuracy on Blind test: 0.5

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.65558004 1.55461335 1.68518972 1.59982467 1.59840369 1.78556776
 1.98289418 1.71361303 1.71010733 1.56034899]

mean value: 1.6846142768859864

key: score_time
value: [0.01405716 0.02408385 0.01391459 0.01108027 0.01359916 0.01913881
 0.01201797 0.01144147 0.01147699 0.01196384]

mean value: 0.014277410507202149

key: test_mcc
value: [1.         0.90369611 0.93548387 0.96824584 0.93743687 0.90369611
 0.93548387 0.93743687 0.87082935 0.90215054]

mean value: 0.9294459430210258

key: train_mcc
value: [0.99283145 0.98561151 0.99283145 0.98921503 0.99283145 0.98921503
 0.98202074 0.99640932 0.99284416 0.99641577]

mean value: 0.9910225917811445

key: test_accuracy
value: [1.         0.9516129  0.96774194 0.98387097 0.96774194 0.9516129
 0.96774194 0.96774194 0.93442623 0.95081967]

mean value: 0.9643310417768377

key: train_accuracy
value: [0.99640288 0.99280576 0.99640288 0.99460432 0.99640288 0.99460432
 0.99100719 0.99820144 0.99640934 0.99820467]

mean value: 0.9955045658266923

key: test_fscore
value: [1.         0.95238095 0.96774194 0.98412698 0.96875    0.95081967
 0.96774194 0.96666667 0.93333333 0.95081967]

mean value: 0.9642381151737973

key: train_fscore
value: [0.99638989 0.99280576 0.99638989 0.99459459 0.99638989 0.99459459
 0.99102334 0.9981982  0.99638989 0.99820467]

mean value: 0.9954980716751404

key: test_precision
value: [1.         0.9375     0.96774194 0.96875    0.93939394 0.96666667
 0.96774194 1.         0.96551724 0.93548387]

mean value: 0.9648795589375401

key: train_precision
value: [1.         0.99280576 1.         0.99638989 1.         0.99638989
 0.98924731 1.         1.         1.        ]

mean value: 0.9974832850617142

key: test_recall
value: [1.         0.96774194 0.96774194 1.         1.         0.93548387
 0.96774194 0.93548387 0.90322581 0.96666667]

mean value: 0.9644086021505376

key: train_recall
value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576
 0.99280576 0.99640288 0.99280576 0.99641577]

mean value: 0.9935264691472628

key: test_roc_auc
value: [1.         0.9516129  0.96774194 0.98387097 0.96774194 0.9516129
 0.96774194 0.96774194 0.93494624 0.95107527]

mean value: 0.9644086021505377

key: train_roc_auc
value: [0.99640288 0.99280576 0.99640288 0.99460432 0.99640288 0.99460432
 0.99100719 0.99820144 0.99640288 0.99820789]

mean value: 0.995504241767876

key: test_jcc
value: [1.         0.90909091 0.9375     0.96875    0.93939394 0.90625
 0.9375     0.93548387 0.875      0.90625   ]

mean value: 0.931521871945259

key: train_jcc
value: [0.99280576 0.98571429 0.99280576 0.98924731 0.99280576 0.98924731
 0.98220641 0.99640288 0.99280576 0.99641577]

mean value: 0.9910456984954045

MCC on Blind test: 0.15

Accuracy on Blind test: 0.35

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01399827 0.01331663 0.01154613 0.01056981 0.01088071 0.00967002
 0.00976062 0.00975561 0.0097971  0.00941205]

mean value: 0.010870695114135742

key: score_time
value: [0.01116037 0.00968766 0.00949836 0.00883269 0.00868964 0.00786233
 0.00786996 0.00779343 0.0077889  0.0078299 ]

mean value: 0.008701324462890625

key: test_mcc
value: [1.         0.87096774 1.         0.96824584 0.90369611 0.87831007
 0.87831007 0.96824584 0.96774194 0.90215054]

mean value: 0.9337668133579895

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.93548387 1.         0.98387097 0.9516129  0.93548387
 0.93548387 0.98387097 0.98360656 0.95081967]

mean value: 0.9660232681121099

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.93548387 1.         0.98412698 0.95238095 0.93103448
 0.93103448 0.98360656 0.98360656 0.95081967]

mean value: 0.9652093559878165

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.93548387 1.         0.96875    0.9375     1.
 1.         1.         1.         0.93548387]

mean value: 0.9777217741935483

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.93548387 1.         1.         0.96774194 0.87096774
 0.87096774 0.96774194 0.96774194 0.96666667]

mean value: 0.9547311827956989

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.93548387 1.         0.98387097 0.9516129  0.93548387
 0.93548387 0.98387097 0.98387097 0.95107527]

mean value: 0.9660752688172043

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.87878788 1.         0.96875    0.90909091 0.87096774
 0.87096774 0.96774194 0.96774194 0.90625   ]

mean value: 0.9340298142717498

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.01

Accuracy on Blind test: 0.2

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10142112 0.10165668 0.10126591 0.10151219 0.11299324 0.11322856
 0.108289   0.1019156  0.10539865 0.1026175 ]

mean value: 0.10502984523773193

key: score_time
value: [0.0171802  0.0173862  0.01719642 0.01748347 0.01896811 0.01896906
 0.01711893 0.01859283 0.01831841 0.01735854]

mean value: 0.01785721778869629

key: test_mcc
value: [1.         0.90369611 0.93548387 0.93548387 0.93743687 0.93548387
 0.93743687 0.96824584 0.96770777 0.90215054]

mean value: 0.9423125607021228

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9516129  0.96774194 0.96774194 0.96774194 0.96774194
 0.96774194 0.98387097 0.98360656 0.95081967]

mean value: 0.9708619777895293

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95238095 0.96774194 0.96774194 0.96875    0.96774194
 0.96666667 0.98360656 0.98412698 0.95081967]

mean value: 0.9709576639134413

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9375     0.96774194 0.96774194 0.93939394 0.96774194
 1.         1.         0.96875    0.93548387]

mean value: 0.9684353616813295

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 0.96774194 0.96774194 1.         0.96774194
 0.93548387 0.96774194 1.         0.96666667]

mean value: 0.9740860215053764

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9516129  0.96774194 0.96774194 0.96774194 0.96774194
 0.96774194 0.98387097 0.98333333 0.95107527]

mean value: 0.9708602150537635

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90909091 0.9375     0.9375     0.93939394 0.9375
 0.93548387 0.96774194 0.96875    0.90625   ]

mean value: 0.9439210654936462

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.2

Accuracy on Blind test: 0.36

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00819397 0.00782299 0.00878263 0.00824928 0.007725   0.00857353
 0.00780129 0.00821066 0.008883   0.00848746]

mean value: 0.008272981643676758

key: score_time
value: [0.00791669 0.00839043 0.00856495 0.00861716 0.00864053 0.00859261
 0.00863576 0.00865197 0.00859213 0.00803781]

mean value: 0.00846400260925293

key: test_mcc
value: [0.81325006 0.82199494 0.83914639 0.90369611 0.87096774 0.90369611
 0.81325006 0.7284928  0.74460444 0.80475071]

mean value: 0.8243849367718851

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.90322581 0.90322581 0.91935484 0.9516129  0.93548387 0.9516129
 0.90322581 0.85483871 0.86885246 0.90163934]

mean value: 0.9093072448439978

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.89655172 0.89285714 0.91803279 0.95238095 0.93548387 0.95081967
 0.89655172 0.83636364 0.86206897 0.89655172]

mean value: 0.9037662199516902

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96296296 1.         0.93333333 0.9375     0.93548387 0.96666667
 0.96296296 0.95833333 0.92592593 0.92857143]

mean value: 0.9511740484724356

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.83870968 0.80645161 0.90322581 0.96774194 0.93548387 0.93548387
 0.83870968 0.74193548 0.80645161 0.86666667]

mean value: 0.8640860215053763

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90322581 0.90322581 0.91935484 0.9516129  0.93548387 0.9516129
 0.90322581 0.85483871 0.86989247 0.90107527]

mean value: 0.9093548387096775

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.8125     0.80645161 0.84848485 0.90909091 0.87878788 0.90625
 0.8125     0.71875    0.75757576 0.8125    ]

mean value: 0.826289100684262

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.26

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.33802533 1.34034443 1.34689403 1.33327985 1.32815957 1.34741807
 1.36258483 1.35150194 1.35220432 1.37553906]

mean value: 1.3475951433181763

key: score_time
value: [0.09532094 0.15330195 0.09112287 0.0915432  0.09900188 0.09554839
 0.09749842 0.0989244  0.09722352 0.09352469]

mean value: 0.10130102634429931

key: test_mcc
value: [1.         0.90369611 0.96824584 0.96824584 0.93743687 0.96824584
 1.         0.96824584 0.96770777 0.8688172 ]

mean value: 0.9550641303879139

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9516129  0.98387097 0.98387097 0.96774194 0.98387097
 1.         0.98387097 0.98360656 0.93442623]

mean value: 0.9772871496562665

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95238095 0.98412698 0.98412698 0.96875    0.98360656
 1.         0.98360656 0.98412698 0.93333333]

mean value: 0.9774058352849336

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9375     0.96875    0.96875    0.93939394 1.
 1.         1.         0.96875    0.93333333]

mean value: 0.9716477272727273

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 1.         1.         1.         0.96774194
 1.         0.96774194 1.         0.93333333]

mean value: 0.9836559139784946

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9516129  0.98387097 0.98387097 0.96774194 0.98387097
 1.         0.98387097 0.98333333 0.9344086 ]

mean value: 0.9772580645161291

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90909091 0.96875    0.96875    0.93939394 0.96774194
 1.         0.96774194 0.96875    0.875     ]

mean value: 0.9565218719452591

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.19

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.89337206 0.89832807 0.98853326 0.97143817 0.95260191 0.94139814
 0.88369918 0.9011116  0.89748955 0.93211508]

mean value: 0.9260087013244629

key: score_time
value: [0.21353126 0.18774438 0.24319863 0.28244352 0.24419403 0.22724342
 0.2297473  0.25225329 0.2684927  0.27329707]

mean value: 0.24221456050872803

key: test_mcc
value: [0.96824584 0.84266484 0.96824584 0.93743687 0.93743687 0.96824584
 1.         0.93548387 0.96770777 0.8688172 ]

mean value: 0.9394284931358869

key: train_mcc
value: [0.96073627 0.95025527 0.97124816 0.96058703 0.96768225 0.95693359
 0.96412858 0.96778244 0.95713569 0.97137405]

mean value: 0.9627863336198357

key: test_accuracy
value: [0.98387097 0.91935484 0.98387097 0.96774194 0.96774194 0.98387097
 1.         0.96774194 0.98360656 0.93442623]

mean value: 0.9692226335272343

key: train_accuracy
value: [0.98021583 0.97482014 0.98561151 0.98021583 0.98381295 0.97841727
 0.98201439 0.98381295 0.97845601 0.98563734]

mean value: 0.9813014220580447

key: test_fscore
value: [0.98412698 0.92307692 0.98412698 0.96875    0.96875    0.98360656
 1.         0.96774194 0.98412698 0.93333333]

mean value: 0.9697639701652129

key: train_fscore
value: [0.98046181 0.97526502 0.98566308 0.98039216 0.98389982 0.97857143
 0.98214286 0.98395722 0.97864769 0.98576512]

mean value: 0.9814766206153425

key: test_precision
value: [0.96875    0.88235294 0.96875    0.93939394 0.93939394 1.
 1.         0.96774194 0.96875    0.93333333]

mean value: 0.9568466088781554

key: train_precision
value: [0.96842105 0.95833333 0.98214286 0.97173145 0.97864769 0.97163121
 0.9751773  0.97526502 0.96830986 0.97879859]

mean value: 0.972845835273727

key: test_recall
value: [1.         0.96774194 1.         1.         1.         0.96774194
 1.         0.96774194 1.         0.93333333]

mean value: 0.9836559139784946

key: train_recall
value: [0.99280576 0.99280576 0.98920863 0.98920863 0.98920863 0.98561151
 0.98920863 0.99280576 0.98920863 0.99283154]

mean value: 0.9902903483664681

key: test_roc_auc
value: [0.98387097 0.91935484 0.98387097 0.96774194 0.96774194 0.98387097
 1.         0.96774194 0.98333333 0.9344086 ]

mean value: 0.9691935483870968

key: train_roc_auc
value: [0.98021583 0.97482014 0.98561151 0.98021583 0.98381295 0.97841727
 0.98201439 0.98381295 0.97847528 0.9856244 ]

mean value: 0.9813020551300895

key: test_jcc
value: [0.96875    0.85714286 0.96875    0.93939394 0.93939394 0.96774194
 1.         0.9375     0.96875    0.875     ]

mean value: 0.9422422671414608

key: train_jcc
value: [0.96167247 0.95172414 0.97173145 0.96153846 0.96830986 0.95804196
 0.96491228 0.96842105 0.95818815 0.97192982]

mean value: 0.9636469650502072

MCC on Blind test: 0.09

Accuracy on Blind test: 0.2

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.02149725 0.00856233 0.00855422 0.00861883 0.00868273 0.00835061
 0.00852489 0.00826049 0.00861621 0.008636  ]

mean value: 0.009830355644226074

key: score_time
value: [0.0092957  0.00866055 0.00866151 0.00840211 0.00860667 0.00837517
 0.00868773 0.00860476 0.00860882 0.00852346]

mean value: 0.00864264965057373

key: test_mcc
value: [0.61807005 0.65372045 0.45374261 0.71004695 0.51856298 0.71004695
 0.42023032 0.74193548 0.54251915 0.57419355]

mean value: 0.5943068479116385

key: train_mcc
value: [0.61176415 0.63718965 0.62604511 0.60075441 0.62596408 0.60075441
 0.65528703 0.62262853 0.64839945 0.64106733]

mean value: 0.6269854139141487

key: test_accuracy
value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871
 0.70967742 0.87096774 0.7704918  0.78688525]

mean value: 0.7960602855631941

key: train_accuracy
value: [0.8057554  0.81834532 0.81294964 0.80035971 0.81294964 0.80035971
 0.82733813 0.81115108 0.82405745 0.82046679]

mean value: 0.8133732870077367

key: test_fscore
value: [0.79310345 0.8358209  0.71186441 0.85245902 0.76923077 0.85245902
 0.71875    0.87096774 0.76666667 0.78688525]

mean value: 0.7958207207099356

key: train_fscore
value: [0.80851064 0.82186949 0.81090909 0.79927667 0.8115942  0.79927667
 0.83098592 0.81415929 0.82624113 0.82269504]

mean value: 0.814551814377158

key: test_precision
value: [0.85185185 0.77777778 0.75       0.86666667 0.73529412 0.86666667
 0.6969697  0.87096774 0.79310345 0.77419355]

mean value: 0.7983491516178162

key: train_precision
value: [0.7972028  0.80622837 0.81985294 0.80363636 0.81751825 0.80363636
 0.8137931  0.80139373 0.81468531 0.81403509]

mean value: 0.8091982321605484

key: test_recall
value: [0.74193548 0.90322581 0.67741935 0.83870968 0.80645161 0.83870968
 0.74193548 0.87096774 0.74193548 0.8       ]

mean value: 0.7961290322580645

key: train_recall
value: [0.82014388 0.8381295  0.80215827 0.79496403 0.8057554  0.79496403
 0.84892086 0.82733813 0.8381295  0.83154122]

mean value: 0.8202044815760295

key: test_roc_auc
value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871
 0.70967742 0.87096774 0.77096774 0.78709677]

mean value: 0.7961290322580645

key: train_roc_auc
value: [0.8057554  0.81834532 0.81294964 0.80035971 0.81294964 0.80035971
 0.82733813 0.81115108 0.82408267 0.82044687]

mean value: 0.8133738170753719

key: test_jcc
value: [0.65714286 0.71794872 0.55263158 0.74285714 0.625      0.74285714
 0.56097561 0.77142857 0.62162162 0.64864865]

mean value: 0.6641111891208169

key: train_jcc
value: [0.67857143 0.69760479 0.68195719 0.66566265 0.68292683 0.66566265
 0.71084337 0.68656716 0.70392749 0.69879518]

mean value: 0.6872518746851146

MCC on Blind test: 0.18

Accuracy on Blind test: 0.52

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.09717035 0.05194712 0.05566955 0.05653906 0.06008887 0.0614419
 0.06166148 0.06023955 0.06417036 0.05456114]

mean value: 0.06234893798828125

key: score_time
value: [0.01015568 0.00965595 0.00964165 0.00960851 0.00993562 0.00997877
 0.01027107 0.00972724 0.00962043 0.00961185]

mean value: 0.009820675849914551

key: test_mcc
value: [1.         0.90369611 0.93548387 0.96824584 0.93743687 0.93743687
 1.         0.96824584 0.90586325 0.8688172 ]

mean value: 0.942522584980111

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.9516129  0.96774194 0.98387097 0.96774194 0.96774194
 1.         0.98387097 0.95081967 0.93442623]

mean value: 0.9707826546800635

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.95238095 0.96774194 0.98412698 0.96875    0.96666667
 1.         0.98360656 0.95384615 0.93333333]

mean value: 0.971045258321501

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.9375     0.96774194 0.96875    0.93939394 1.
 1.         1.         0.91176471 0.93333333]

mean value: 0.9658483914093496

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.96774194 0.96774194 1.         1.         0.93548387
 1.         0.96774194 1.         0.93333333]

mean value: 0.9772043010752688

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.9516129  0.96774194 0.98387097 0.96774194 0.96774194
 1.         0.98387097 0.95       0.9344086 ]

mean value: 0.9706989247311828

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.90909091 0.9375     0.96875    0.93939394 0.93548387
 1.         0.96774194 0.91176471 0.875     ]

mean value: 0.9444725360818814

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.2

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01567197 0.04181266 0.04561234 0.0498898  0.04169583 0.04362798
 0.04267979 0.04328203 0.0497613  0.0470922 ]

mean value: 0.04211258888244629

key: score_time
value: [0.01037931 0.02170181 0.01888394 0.01488948 0.01078224 0.02180099
 0.02077007 0.0193584  0.0108037  0.01081157]

mean value: 0.016018152236938477

key: test_mcc
value: [0.93548387 0.84266484 0.93548387 0.93743687 0.87278605 1.
 0.96824584 0.87278605 0.9344086  0.8688172 ]

mean value: 0.9168113188472994

key: train_mcc
value: [0.93914669 0.94653932 0.93563929 0.93914669 0.94266562 0.93195016
 0.93238486 0.94283651 0.93575728 0.9427658 ]

mean value: 0.9388832217176918

key: test_accuracy
value: [0.96774194 0.91935484 0.96774194 0.96774194 0.93548387 1.
 0.98387097 0.93548387 0.96721311 0.93442623]

mean value: 0.9579058699101005

key: train_accuracy
value: [0.96942446 0.97302158 0.9676259  0.96942446 0.97122302 0.96582734
 0.96582734 0.97122302 0.96768402 0.97127469]

mean value: 0.969255582966302

key: test_fscore
value: [0.96774194 0.92307692 0.96774194 0.96875    0.9375     1.
 0.98412698 0.93333333 0.96774194 0.93333333]

mean value: 0.9583346380322186

key: train_fscore
value: [0.96980462 0.97345133 0.96808511 0.96980462 0.97153025 0.96625222
 0.9664903  0.97163121 0.96808511 0.97163121]

mean value: 0.9696765956964183

key: test_precision
value: [0.96774194 0.88235294 0.96774194 0.93939394 0.90909091 1.
 0.96875    0.96551724 0.96774194 0.93333333]

mean value: 0.9501664170825576

key: train_precision
value: [0.95789474 0.95818815 0.95454545 0.95789474 0.96126761 0.95438596
 0.94809689 0.95804196 0.95454545 0.96140351]

mean value: 0.9566264459258345

key: test_recall
value: [0.96774194 0.96774194 0.96774194 1.         0.96774194 1.
 1.         0.90322581 0.96774194 0.93333333]

mean value: 0.9675268817204301

key: train_recall
value: [0.98201439 0.98920863 0.98201439 0.98201439 0.98201439 0.97841727
 0.98561151 0.98561151 0.98201439 0.98207885]

mean value: 0.9830999716355947

key: test_roc_auc
value: [0.96774194 0.91935484 0.96774194 0.96774194 0.93548387 1.
 0.98387097 0.93548387 0.9672043  0.9344086 ]

mean value: 0.9579032258064516

key: train_roc_auc
value: [0.96942446 0.97302158 0.9676259  0.96942446 0.97122302 0.96582734
 0.96582734 0.97122302 0.9677097  0.97125525]

mean value: 0.9692562079368764

key: test_jcc
value: [0.9375     0.85714286 0.9375     0.93939394 0.88235294 1.
 0.96875    0.875      0.9375     0.875     ]

mean value: 0.9210139737713268

key: train_jcc
value: [0.94137931 0.94827586 0.93814433 0.94137931 0.94463668 0.9347079
 0.93515358 0.94482759 0.93814433 0.94482759]

mean value: 0.9411476480564737

MCC on Blind test: 0.14

Accuracy on Blind test: 0.35

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.02284217 0.00781918 0.00835109 0.00833607 0.00749111 0.00754762
 0.00822759 0.00809884 0.00830936 0.00828028]

mean value: 0.009530329704284668

key: score_time
value: [0.00877237 0.00818181 0.00863433 0.00789332 0.00792742 0.00775814
 0.00851727 0.00839686 0.00838804 0.00852823]

mean value: 0.008299779891967774

key: test_mcc
value: [0.74193548 0.55301004 0.55895656 0.69047575 0.60677988 0.80813523
 0.46358632 0.77459667 0.57576971 0.75310667]

mean value: 0.6526352311777236

key: train_mcc
value: [0.67282515 0.67609995 0.67144111 0.65172831 0.66087942 0.64772254
 0.68595876 0.65901019 0.68263871 0.65745214]

mean value: 0.6665756264215859

key: test_accuracy
value: [0.87096774 0.77419355 0.77419355 0.83870968 0.79032258 0.90322581
 0.72580645 0.88709677 0.78688525 0.86885246]

mean value: 0.8220253833950291

key: train_accuracy
value: [0.83273381 0.83453237 0.83273381 0.82194245 0.82733813 0.82014388
 0.83992806 0.82553957 0.83842011 0.82585278]

mean value: 0.8299164976815675

key: test_fscore
value: [0.87096774 0.78787879 0.79411765 0.85294118 0.81690141 0.90625
 0.75362319 0.88888889 0.8        0.87878788]

mean value: 0.8350356717876952

key: train_fscore
value: [0.84422111 0.84563758 0.84317032 0.83472454 0.83838384 0.83277592
 0.84991568 0.83806344 0.84797297 0.83697479]

mean value: 0.8411840193764768

key: test_precision
value: [0.87096774 0.74285714 0.72972973 0.78378378 0.725      0.87878788
 0.68421053 0.875      0.76470588 0.80555556]

mean value: 0.7860598241318305

key: train_precision
value: [0.78996865 0.79245283 0.79365079 0.7788162  0.78797468 0.778125
 0.8        0.78193146 0.79936306 0.78797468]

mean value: 0.789025736384194

key: test_recall
value: [0.87096774 0.83870968 0.87096774 0.93548387 0.93548387 0.93548387
 0.83870968 0.90322581 0.83870968 0.96666667]

mean value: 0.8934408602150538

key: train_recall
value: [0.90647482 0.90647482 0.89928058 0.89928058 0.89568345 0.89568345
 0.90647482 0.9028777  0.9028777  0.89247312]

mean value: 0.9007581031948635

key: test_roc_auc
value: [0.87096774 0.77419355 0.77419355 0.83870968 0.79032258 0.90322581
 0.72580645 0.88709677 0.78602151 0.87043011]

mean value: 0.8220967741935484

key: train_roc_auc
value: [0.83273381 0.83453237 0.83273381 0.82194245 0.82733813 0.82014388
 0.83992806 0.82553957 0.83853562 0.82573296]

mean value: 0.829916067146283

key: test_jcc
value: [0.77142857 0.65       0.65853659 0.74358974 0.69047619 0.82857143
 0.60465116 0.8        0.66666667 0.78378378]

mean value: 0.7197704132672936

key: train_jcc
value: [0.73043478 0.73255814 0.72886297 0.71633238 0.72173913 0.71346705
 0.73900293 0.72126437 0.73607038 0.71965318]

mean value: 0.7259385314063227

MCC on Blind test: 0.21

Accuracy on Blind test: 0.5

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01066303 0.01422668 0.01254988 0.01286054 0.01497436 0.01457095
 0.01441455 0.01421928 0.01397729 0.01655555]

mean value: 0.013901209831237793

key: score_time
value: [0.00798249 0.01007485 0.01002264 0.01035452 0.01045942 0.01085925
 0.01050234 0.01053119 0.01049948 0.01047754]

mean value: 0.010176372528076173

key: test_mcc
value: [0.84983659 0.90369611 0.96824584 0.93743687 0.83914639 1.
 0.93743687 0.84266484 0.9344086  0.90215054]

mean value: 0.9115022641468933

key: train_mcc
value: [0.85210391 0.96048758 0.90882979 0.95324358 0.8782527  0.93534863
 0.935276   0.91827075 0.93969601 0.97130001]

mean value: 0.9252808948765114

key: test_accuracy
value: [0.91935484 0.9516129  0.98387097 0.96774194 0.91935484 1.
 0.96774194 0.91935484 0.96721311 0.95081967]

mean value: 0.9547065044949762

key: train_accuracy
value: [0.92266187 0.98021583 0.95323741 0.97661871 0.93705036 0.9676259
 0.9676259  0.95863309 0.96947935 0.98563734]

mean value: 0.9618785761337071

key: test_fscore
value: [0.9122807  0.95238095 0.98412698 0.96875    0.91803279 1.
 0.96666667 0.91525424 0.96774194 0.95081967]

mean value: 0.9536053936717389

key: train_fscore
value: [0.91746641 0.980322   0.95486111 0.97666068 0.93383743 0.96785714
 0.96750903 0.95764273 0.97001764 0.98561151]

mean value: 0.961178567797733

key: test_precision
value: [1.         0.9375     0.96875    0.93939394 0.93333333 1.
 1.         0.96428571 0.96774194 0.93548387]

mean value: 0.96464887934646

key: train_precision
value: [0.98353909 0.97508897 0.92281879 0.97491039 0.98406375 0.96099291
 0.97101449 0.98113208 0.95155709 0.98916968]

mean value: 0.9694287238395796

key: test_recall
value: [0.83870968 0.96774194 1.         1.         0.90322581 1.
 0.93548387 0.87096774 0.96774194 0.96666667]

mean value: 0.9450537634408602

key: train_recall
value: [0.85971223 0.98561151 0.98920863 0.97841727 0.88848921 0.97482014
 0.96402878 0.9352518  0.98920863 0.98207885]

mean value: 0.9546827054485444

key: test_roc_auc
value: [0.91935484 0.9516129  0.98387097 0.96774194 0.91935484 1.
 0.96774194 0.91935484 0.9672043  0.95107527]

mean value: 0.954731182795699

key: train_roc_auc
value: [0.92266187 0.98021583 0.95323741 0.97661871 0.93705036 0.9676259
 0.9676259  0.95863309 0.96951471 0.98564374]

mean value: 0.9618827518630257

key: test_jcc
value: [0.83870968 0.90909091 0.96875    0.93939394 0.84848485 1.
 0.93548387 0.84375    0.9375     0.90625   ]

mean value: 0.9127413245356794

key: train_jcc
value: [0.84751773 0.96140351 0.91362126 0.95438596 0.87588652 0.93771626
 0.93706294 0.91872792 0.94178082 0.97163121]

mean value: 0.925973413428646

MCC on Blind test: 0.13

Accuracy on Blind test: 0.39

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01376939 0.01323795 0.01437616 0.01475048 0.01206875 0.01291966
 0.01519465 0.0118556  0.01220989 0.01324606]

mean value: 0.013362860679626465

key: score_time
value: [0.01042581 0.01045299 0.01043582 0.01047158 0.01042938 0.01043868
 0.0104847  0.01037741 0.0103879  0.01042461]

mean value: 0.010432887077331542

key: test_mcc
value: [0.90748521 0.87278605 0.93548387 0.87096774 0.90369611 0.90369611
 0.84983659 0.84983659 0.84710837 0.83638369]

mean value: 0.8777280337009071

key: train_mcc
value: [0.91267965 0.91482985 0.90302377 0.91827075 0.91106862 0.91267965
 0.89008997 0.90161686 0.7528037  0.94982722]

mean value: 0.8966890034959883

key: test_accuracy
value: [0.9516129  0.93548387 0.96774194 0.93548387 0.9516129  0.9516129
 0.91935484 0.91935484 0.91803279 0.91803279]

mean value: 0.9368323638286621

key: train_accuracy
value: [0.95503597 0.95683453 0.95143885 0.95863309 0.95503597 0.95503597
 0.94244604 0.94964029 0.86355476 0.97486535]

mean value: 0.9462520827144388

key: test_fscore
value: [0.95384615 0.9375     0.96774194 0.93548387 0.95238095 0.95238095
 0.9122807  0.9122807  0.92537313 0.91525424]

mean value: 0.9364522640184937

key: train_fscore
value: [0.95667244 0.95789474 0.95099819 0.95764273 0.95395948 0.95667244
 0.9391635  0.94776119 0.87898089 0.97508897]

mean value: 0.9474834571073163

key: test_precision
value: [0.91176471 0.90909091 0.96774194 0.93548387 0.9375     0.9375
 1.         1.         0.86111111 0.93103448]

mean value: 0.9391227015294606

key: train_precision
value: [0.92307692 0.93493151 0.95970696 0.98113208 0.97735849 0.92307692
 0.99596774 0.98449612 0.78857143 0.96819788]

mean value: 0.9436516053144435

key: test_recall
value: [1.         0.96774194 0.96774194 0.93548387 0.96774194 0.96774194
 0.83870968 0.83870968 1.         0.9       ]

mean value: 0.9383870967741935

key: train_recall
value: [0.99280576 0.98201439 0.94244604 0.9352518  0.93165468 0.99280576
 0.88848921 0.91366906 0.99280576 0.98207885]

mean value: 0.9554021299089761

key: test_roc_auc
value: [0.9516129  0.93548387 0.96774194 0.93548387 0.9516129  0.9516129
 0.91935484 0.91935484 0.91666667 0.91774194]

mean value: 0.9366666666666668

key: train_roc_auc
value: [0.95503597 0.95683453 0.95143885 0.95863309 0.95503597 0.95503597
 0.94244604 0.94964029 0.86378639 0.97485238]

mean value: 0.946273948583069

key: test_jcc
value: [0.91176471 0.88235294 0.9375     0.87878788 0.90909091 0.90909091
 0.83870968 0.83870968 0.86111111 0.84375   ]

mean value: 0.8810867809978341

key: train_jcc
value: [0.91694352 0.91919192 0.90657439 0.91872792 0.91197183 0.91694352
 0.88530466 0.90070922 0.78409091 0.95138889]

mean value: 0.9011846780361379

MCC on Blind test: 0.13

Accuracy on Blind test: 0.33

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10950232 0.09739041 0.09704351 0.09424162 0.09430504 0.09652781
 0.09616399 0.10172677 0.10454583 0.09388137]

mean value: 0.09853286743164062

key: score_time
value: [0.01543546 0.014148   0.01423931 0.01424742 0.0141356  0.0143621
 0.01467228 0.01546311 0.01425433 0.01426816]

mean value: 0.014522576332092285

key: test_mcc
value: [0.96824584 0.96824584 0.93548387 0.96824584 0.93743687 0.96824584
 1.         1.         0.90586325 0.93649139]

mean value: 0.958825873085774

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.96774194 0.98387097
 1.         1.         0.95081967 0.96721311]

mean value: 0.9789000528820729

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98360656 0.98360656 0.96774194 0.98412698 0.96875    0.98360656
 1.         1.         0.95384615 0.96774194]

mean value: 0.9793026681072028

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.96774194 0.96875    0.93939394 1.
 1.         1.         0.91176471 0.9375    ]

mean value: 0.9725150580760163

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96774194 0.96774194 0.96774194 1.         1.         0.96774194
 1.         1.         1.         1.        ]

mean value: 0.9870967741935484

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.96774194 0.98387097
 1.         1.         0.95       0.96774194]

mean value: 0.9788709677419355

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96774194 0.96774194 0.9375     0.96875    0.93939394 0.96774194
 1.         1.         0.91176471 0.9375    ]

mean value: 0.9598134451727905

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.21

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03574371 0.03871679 0.05276203 0.05378747 0.05436444 0.04580855
 0.04448795 0.03396726 0.04315591 0.03275442]

mean value: 0.04355485439300537

key: score_time
value: [0.02251959 0.03061891 0.03506684 0.03417039 0.03329325 0.02393937
 0.03167629 0.02193832 0.03418398 0.01938605]

mean value: 0.028679299354553222

key: test_mcc
value: [1.         0.87096774 1.         0.96824584 0.90369611 0.90748521
 0.93743687 0.96824584 1.         0.8688172 ]

mean value: 0.9424894812989454

key: train_mcc
value: [0.99640932 0.99640932 0.99280576 0.99283145 0.98561151 0.99280576
 0.99640932 0.99640932 0.99641572 0.99641577]

mean value: 0.9942523261997296

key: test_accuracy
value: [1.         0.93548387 1.         0.98387097 0.9516129  0.9516129
 0.96774194 0.98387097 1.         0.93442623]

mean value: 0.9708619777895293

key: train_accuracy
value: [0.99820144 0.99820144 0.99640288 0.99640288 0.99280576 0.99640288
 0.99820144 0.99820144 0.99820467 0.99820467]

mean value: 0.9971229479612002

key: test_fscore
value: [1.         0.93548387 1.         0.98412698 0.95238095 0.94915254
 0.96666667 0.98360656 1.         0.93333333]

mean value: 0.9704750907225609

key: train_fscore
value: [0.9981982  0.9981982  0.99640288 0.99638989 0.99280576 0.99640288
 0.99820467 0.9981982  0.9981982  0.99820467]

mean value: 0.997120353100802

key: test_precision
value: [1.         0.93548387 1.         0.96875    0.9375     1.
 1.         1.         1.         0.93333333]

mean value: 0.9775067204301076

key: train_precision
value: [1.         1.         0.99640288 1.         0.99280576 0.99640288
 0.99641577 1.         1.         1.        ]

mean value: 0.9982027281400686

key: test_recall
value: [1.         0.93548387 1.         1.         0.96774194 0.90322581
 0.93548387 0.96774194 1.         0.93333333]

mean value: 0.9643010752688173

key: train_recall
value: [0.99640288 0.99640288 0.99640288 0.99280576 0.99280576 0.99640288
 1.         0.99640288 0.99640288 0.99641577]

mean value: 0.9960444547587737

key: test_roc_auc
value: [1.         0.93548387 1.         0.98387097 0.9516129  0.9516129
 0.96774194 0.98387097 1.         0.9344086 ]

mean value: 0.9708602150537635

key: train_roc_auc
value: [0.99820144 0.99820144 0.99640288 0.99640288 0.99280576 0.99640288
 0.99820144 0.99820144 0.99820144 0.99820789]

mean value: 0.9971229468038473

key: test_jcc
value: [1.         0.87878788 1.         0.96875    0.90909091 0.90322581
 0.93548387 0.96774194 1.         0.875     ]

mean value: 0.9438080400782014

key: train_jcc
value: [0.99640288 0.99640288 0.99283154 0.99280576 0.98571429 0.99283154
 0.99641577 0.99640288 0.99640288 0.99641577]

mean value: 0.994262617555725

MCC on Blind test: 0.06

Accuracy on Blind test: 0.21

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.18538761 0.21855521 0.19818258 0.16929436 0.21186471 0.22627425
 0.19866776 0.15631485 0.20869946 0.18409514]

mean value: 0.19573359489440917

key: score_time
value: [0.02069068 0.01292896 0.04060698 0.02179265 0.02101707 0.0241735
 0.01276922 0.02045441 0.02064705 0.03370333]

mean value: 0.022878384590148924

key: test_mcc
value: [0.90748521 0.62471615 0.77459667 0.83914639 0.7190925  0.80813523
 0.64820372 0.83914639 0.63939757 0.77096774]

mean value: 0.7570887579844512

key: train_mcc
value: [0.88143754 0.84999939 0.88509826 0.86366703 0.87437795 0.88157448
 0.87826623 0.87806148 0.8713058  0.88511972]

mean value: 0.8748907880078497

key: test_accuracy
value: [0.9516129  0.80645161 0.88709677 0.91935484 0.85483871 0.90322581
 0.82258065 0.91935484 0.81967213 0.8852459 ]

mean value: 0.8769434161819143

key: train_accuracy
value: [0.94064748 0.92446043 0.94244604 0.93165468 0.93705036 0.94064748
 0.93884892 0.93884892 0.93536804 0.94254937]

mean value: 0.9372521731268486

key: test_fscore
value: [0.94915254 0.82352941 0.88888889 0.92063492 0.86567164 0.90625
 0.83076923 0.91803279 0.82539683 0.8852459 ]

mean value: 0.8813572150143087

key: train_fscore
value: [0.94117647 0.92631579 0.9430605  0.93262411 0.93783304 0.94138544
 0.93992933 0.93971631 0.93639576 0.94285714]

mean value: 0.9381293887479757

key: test_precision
value: [1.         0.75675676 0.875      0.90625    0.80555556 0.87878788
 0.79411765 0.93333333 0.8125     0.87096774]

mean value: 0.8633268913427832

key: train_precision
value: [0.93286219 0.90410959 0.93309859 0.91958042 0.92631579 0.92982456
 0.92361111 0.92657343 0.92013889 0.93950178]

mean value: 0.9255616347793583

key: test_recall
value: [0.90322581 0.90322581 0.90322581 0.93548387 0.93548387 0.93548387
 0.87096774 0.90322581 0.83870968 0.9       ]

mean value: 0.9029032258064515

key: train_recall
value: [0.94964029 0.94964029 0.95323741 0.94604317 0.94964029 0.95323741
 0.95683453 0.95323741 0.95323741 0.94623656]

mean value: 0.9510984760578634

key: test_roc_auc
value: [0.9516129  0.80645161 0.88709677 0.91935484 0.85483871 0.90322581
 0.82258065 0.91935484 0.81935484 0.88548387]

mean value: 0.8769354838709678

key: train_roc_auc
value: [0.94064748 0.92446043 0.94244604 0.93165468 0.93705036 0.94064748
 0.93884892 0.93884892 0.93540007 0.94254274]

mean value: 0.9372547123591449

key: test_jcc
value: [0.90322581 0.7        0.8        0.85294118 0.76315789 0.82857143
 0.71052632 0.84848485 0.7027027  0.79411765]

mean value: 0.790372782026632

key: train_jcc
value: [0.88888889 0.8627451  0.89225589 0.87375415 0.88294314 0.88926174
 0.88666667 0.88628763 0.88039867 0.89189189]

mean value: 0.8835093775860033

MCC on Blind test: 0.22

Accuracy on Blind test: 0.49

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.24915671 0.24478769 0.25460291 0.25357485 0.25452995 0.24537802
 0.2443285  0.24822903 0.24816132 0.25430918]

mean value: 0.24970581531524658

key: score_time
value: [0.00863647 0.0090971  0.00848818 0.00925422 0.00943804 0.00865912
 0.00864434 0.00895667 0.00889111 0.00870037]

mean value: 0.008876562118530273

key: test_mcc
value: [1.         0.87096774 1.         0.96824584 0.93743687 0.90748521
 0.96824584 0.96824584 1.         0.8688172 ]

mean value: 0.9489444535426244

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [1.         0.93548387 1.         0.98387097 0.96774194 0.9516129
 0.98387097 0.98387097 1.         0.93442623]

mean value: 0.9740877842411423

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [1.         0.93548387 1.         0.98412698 0.96875    0.94915254
 0.98360656 0.98360656 1.         0.93333333]

mean value: 0.9738059845555039

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         0.93548387 1.         0.96875    0.93939394 1.
 1.         1.         1.         0.93333333]

mean value: 0.9776961143695014

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [1.         0.93548387 1.         1.         1.         0.90322581
 0.96774194 0.96774194 1.         0.93333333]

mean value: 0.970752688172043

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [1.         0.93548387 1.         0.98387097 0.96774194 0.9516129
 0.98387097 0.98387097 1.         0.9344086 ]

mean value: 0.9740860215053764

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [1.         0.87878788 1.         0.96875    0.93939394 0.90322581
 0.96774194 0.96774194 1.         0.875     ]

mean value: 0.9500641495601173

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.19

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01151943 0.01374769 0.01429176 0.01392269 0.01409912 0.01630855
 0.01395249 0.01366401 0.01401973 0.01421189]

mean value: 0.013973736763000488

key: score_time
value: [0.01094151 0.01091838 0.01081514 0.01111507 0.01110363 0.01155281
 0.01108122 0.01173496 0.0118649  0.0110836 ]

mean value: 0.01122112274169922

key: test_mcc
value: [0.75623534 0.7130241  0.67419986 0.87831007 0.35659298 0.7284928
 0.61807005 0.87278605 0.70874158 0.47128445]

mean value: 0.6777737268610616

key: train_mcc
value: [0.7898587  0.84192273 0.79323895 0.88226013 0.52711711 0.8046478
 0.84911865 0.84598626 0.839052   0.5797551 ]

mean value: 0.7752957434463477

key: test_accuracy
value: [0.87096774 0.85483871 0.82258065 0.93548387 0.64516129 0.85483871
 0.80645161 0.93548387 0.85245902 0.72131148]

mean value: 0.8299576943416181

key: train_accuracy
value: [0.88848921 0.92086331 0.88848921 0.94064748 0.71942446 0.89748201
 0.92446043 0.92266187 0.91741472 0.76481149]

mean value: 0.8784744197460703

key: test_fscore
value: [0.88235294 0.86153846 0.79245283 0.93939394 0.5        0.86956522
 0.79310345 0.9375     0.84745763 0.65306122]

mean value: 0.8076425689573157

key: train_fscore
value: [0.89768977 0.92       0.876      0.94200351 0.6119403  0.9048414
 0.92363636 0.92416226 0.91287879 0.70561798]

mean value: 0.861877037129891

key: test_precision
value: [0.81081081 0.82352941 0.95454545 0.88571429 0.84615385 0.78947368
 0.85185185 0.90909091 0.89285714 0.84210526]

mean value: 0.8606132660157428

key: train_precision
value: [0.82926829 0.93014706 0.98648649 0.9209622  0.99193548 0.84423676
 0.93382353 0.90657439 0.964      0.94578313]

mean value: 0.9253217337706788

key: test_recall
value: [0.96774194 0.90322581 0.67741935 1.         0.35483871 0.96774194
 0.74193548 0.96774194 0.80645161 0.53333333]

mean value: 0.7920430107526881

key: train_recall
value: [0.97841727 0.91007194 0.78776978 0.96402878 0.44244604 0.97482014
 0.91366906 0.94244604 0.86690647 0.56272401]

mean value: 0.8343299553905263

key: test_roc_auc
value: [0.87096774 0.85483871 0.82258065 0.93548387 0.64516129 0.85483871
 0.80645161 0.93548387 0.85322581 0.71827957]

mean value: 0.8297311827956989

key: train_roc_auc
value: [0.88848921 0.92086331 0.88848921 0.94064748 0.71942446 0.89748201
 0.92446043 0.92266187 0.91732421 0.76517496]

mean value: 0.8785017147572265

key: test_jcc
value: [0.78947368 0.75675676 0.65625    0.88571429 0.33333333 0.76923077
 0.65714286 0.88235294 0.73529412 0.48484848]

mean value: 0.6950397230060543

key: train_jcc
value: [0.81437126 0.85185185 0.77935943 0.89036545 0.44086022 0.82621951
 0.85810811 0.85901639 0.83972125 0.54513889]

mean value: 0.7705012360490753

MCC on Blind test: 0.12

Accuracy on Blind test: 0.77

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.03200245 0.03018045 0.02014041 0.03684139 0.03164029 0.0324831
 0.03526735 0.02932453 0.02801824 0.02917194]

mean value: 0.0305070161819458

key: score_time
value: [0.0254848  0.03329682 0.01865816 0.02000928 0.02121401 0.01249504
 0.01821375 0.02222514 0.02327013 0.02214599]

mean value: 0.02170131206512451

key: test_mcc
value: [0.96824584 0.84266484 0.90369611 0.93743687 0.90748521 0.93548387
 0.90369611 0.90369611 0.90215054 0.80322581]

mean value: 0.9007781314102745

key: train_mcc
value: [0.94283651 0.93585746 0.92124484 0.9354697  0.93563929 0.92494527
 0.91054923 0.93563929 0.92138939 0.92878086]

mean value: 0.929235183258956

key: test_accuracy
value: [0.98387097 0.91935484 0.9516129  0.96774194 0.9516129  0.96774194
 0.9516129  0.9516129  0.95081967 0.90163934]

mean value: 0.9497620306716024

key: train_accuracy
value: [0.97122302 0.9676259  0.96043165 0.9676259  0.9676259  0.96223022
 0.95503597 0.9676259  0.96050269 0.96409336]

mean value: 0.9644020510700955

key: test_fscore
value: [0.98412698 0.92307692 0.95081967 0.96875    0.95384615 0.96774194
 0.95081967 0.95081967 0.95081967 0.9       ]

mean value: 0.9500820685058522

key: train_fscore
value: [0.97163121 0.96819788 0.96099291 0.96797153 0.96808511 0.96283186
 0.95575221 0.96808511 0.96099291 0.96478873]

mean value: 0.9649329447341147

key: test_precision
value: [0.96875    0.88235294 0.96666667 0.93939394 0.91176471 0.96774194
 0.96666667 0.96666667 0.96666667 0.9       ]

mean value: 0.9436670188603301

key: train_precision
value: [0.95804196 0.95138889 0.94755245 0.95774648 0.95454545 0.94773519
 0.94076655 0.95454545 0.94755245 0.94809689]

mean value: 0.9507971757973318

key: test_recall
value: [1.         0.96774194 0.93548387 1.         1.         0.96774194
 0.93548387 0.93548387 0.93548387 0.9       ]

mean value: 0.957741935483871

key: train_recall
value: [0.98561151 0.98561151 0.97482014 0.97841727 0.98201439 0.97841727
 0.97122302 0.98201439 0.97482014 0.98207885]

mean value: 0.9795028493334365

key: test_roc_auc
value: [0.98387097 0.91935484 0.9516129  0.96774194 0.9516129  0.96774194
 0.9516129  0.9516129  0.95107527 0.9016129 ]

mean value: 0.9497849462365592

key: train_roc_auc
value: [0.97122302 0.9676259  0.96043165 0.9676259  0.9676259  0.96223022
 0.95503597 0.9676259  0.96052835 0.96406101]

mean value: 0.9644013821201104

key: test_jcc
value: [0.96875    0.85714286 0.90625    0.93939394 0.91176471 0.9375
 0.90625    0.90625    0.90625    0.81818182]

mean value: 0.9057733320600968

key: train_jcc
value: [0.94482759 0.93835616 0.92491468 0.93793103 0.93814433 0.92832765
 0.91525424 0.93814433 0.92491468 0.93197279]

mean value: 0.9322787467857844

MCC on Blind test: 0.19

Accuracy on Blind test: 0.44

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.25432348 0.2824018  0.20859909 0.19930434 0.2196362  0.19928908
 0.19986963 0.19727373 0.24380755 0.21062398]

mean value: 0.22151288986206055

key: score_time
value: [0.02141023 0.0218761  0.02016091 0.01457238 0.01933861 0.01388955
 0.01085019 0.01082468 0.0215745  0.02148724]

mean value: 0.017598438262939452

key: test_mcc
value: [0.96824584 0.84266484 0.90369611 0.93743687 0.87278605 0.96824584
 0.93548387 0.90369611 0.9344086  0.83638369]

mean value: 0.9103047822115174

key: train_mcc
value: [0.94283651 0.94283651 0.93563929 0.9354697  0.93900081 0.92844206
 0.93238486 0.9393413  0.93207468 0.9355825 ]

mean value: 0.9363608223697346

key: test_accuracy
value: [0.98387097 0.91935484 0.9516129  0.96774194 0.93548387 0.98387097
 0.96774194 0.9516129  0.96721311 0.91803279]

mean value: 0.9546536224219989

key: train_accuracy
value: [0.97122302 0.97122302 0.9676259  0.9676259  0.96942446 0.96402878
 0.96582734 0.96942446 0.96588869 0.96768402]

mean value: 0.9679975588649368

key: test_fscore
value: [0.98412698 0.92307692 0.95081967 0.96875    0.9375     0.98360656
 0.96774194 0.95081967 0.96774194 0.91525424]

mean value: 0.9549437917099128

key: train_fscore
value: [0.97163121 0.97163121 0.96808511 0.96797153 0.96969697 0.96453901
 0.9664903  0.9699115  0.96625222 0.96808511]

mean value: 0.9684294155648834

key: test_precision
value: [0.96875    0.88235294 0.96666667 0.93939394 0.90909091 1.
 0.96774194 0.96666667 0.96774194 0.93103448]

mean value: 0.9499439476721016

key: train_precision
value: [0.95804196 0.95804196 0.95454545 0.95774648 0.96113074 0.95104895
 0.94809689 0.95470383 0.95438596 0.95789474]

mean value: 0.9555636962921179

key: test_recall
value: [1.         0.96774194 0.93548387 1.         0.96774194 0.96774194
 0.96774194 0.93548387 0.96774194 0.9       ]

mean value: 0.9609677419354838

key: train_recall
value: [0.98561151 0.98561151 0.98201439 0.97841727 0.97841727 0.97841727
 0.98561151 0.98561151 0.97841727 0.97849462]

mean value: 0.9816624120058792

key: test_roc_auc
value: [0.98387097 0.91935484 0.9516129  0.96774194 0.93548387 0.98387097
 0.96774194 0.9516129  0.9672043  0.91774194]

mean value: 0.9546236559139786

key: train_roc_auc
value: [0.97122302 0.97122302 0.9676259  0.9676259  0.96942446 0.96402878
 0.96582734 0.96942446 0.96591114 0.96766458]

mean value: 0.9679978597766948

key: test_jcc
value: [0.96875    0.85714286 0.90625    0.93939394 0.88235294 0.96774194
 0.9375     0.90625    0.9375     0.84375   ]

mean value: 0.9146631673197139

key: train_jcc
value: [0.94482759 0.94482759 0.93814433 0.93793103 0.94117647 0.93150685
 0.93515358 0.94158076 0.9347079  0.93814433]

mean value: 0.9388000430005232

MCC on Blind test: 0.15

Accuracy on Blind test: 0.38

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02441096 0.02007389 0.02128315 0.01857615 0.01912689 0.01964045
 0.02089906 0.01830864 0.02194476 0.0219276 ]

mean value: 0.02061915397644043

key: score_time
value: [0.01061249 0.01058674 0.01089358 0.01047206 0.01052213 0.01050425
 0.01066399 0.01052094 0.01055193 0.01059175]

mean value: 0.010591983795166016

key: test_mcc
value: [0.56360186 0.56360186 0.75       0.68884672 0.8819171  0.82717019
 0.9375     0.87083333 0.80753845 0.82078268]

mean value: 0.7711792204154371

key: train_mcc
value: [0.83904826 0.83305418 0.804094   0.83230783 0.81084496 0.79737782
 0.84634011 0.79137125 0.81153605 0.79748625]

mean value: 0.8163460715624157

key: test_accuracy
value: [0.78125    0.78125    0.875      0.84375    0.9375     0.90625
 0.96774194 0.93548387 0.90322581 0.90322581]

mean value: 0.8834677419354838

key: train_accuracy
value: [0.91901408 0.91549296 0.90140845 0.91549296 0.90492958 0.89788732
 0.92280702 0.89473684 0.90526316 0.89824561]

mean value: 0.9075277983691623

key: test_fscore
value: [0.78787879 0.77419355 0.875      0.84848485 0.94117647 0.91428571
 0.96774194 0.93333333 0.90909091 0.91428571]

mean value: 0.886547126181851

key: train_fscore
value: [0.9209622  0.91836735 0.90410959 0.91780822 0.90721649 0.90102389
 0.92465753 0.89864865 0.90721649 0.90034364]

mean value: 0.9100354060453281

key: test_precision
value: [0.76470588 0.8        0.875      0.82352941 0.88888889 0.84210526
 0.9375     0.93333333 0.88235294 0.84210526]

mean value: 0.858952098383213

key: train_precision
value: [0.89932886 0.88815789 0.88       0.89333333 0.88590604 0.87417219
 0.90604027 0.86928105 0.88590604 0.87919463]

mean value: 0.8861320298178448

key: test_recall
value: [0.8125     0.75       0.875      0.875      1.         1.
 1.         0.93333333 0.9375     1.        ]

mean value: 0.9183333333333333

key: train_recall
value: [0.94366197 0.95070423 0.92957746 0.94366197 0.92957746 0.92957746
 0.94405594 0.93006993 0.92957746 0.92253521]

mean value: 0.9352999113562493

key: test_roc_auc
value: [0.78125    0.78125    0.875      0.84375    0.9375     0.90625
 0.96875    0.93541667 0.90208333 0.9       ]

mean value: 0.883125

key: train_roc_auc
value: [0.91901408 0.91549296 0.90140845 0.91549296 0.90492958 0.89788732
 0.9227322  0.89461243 0.90534817 0.89833054]

mean value: 0.9075248694967005

key: test_jcc
value: [0.65       0.63157895 0.77777778 0.73684211 0.88888889 0.84210526
 0.9375     0.875      0.83333333 0.84210526]

mean value: 0.8015131578947369

key: train_jcc
value: [0.85350318 0.8490566  0.825      0.84810127 0.83018868 0.81987578
 0.85987261 0.81595092 0.83018868 0.81875   ]

mean value: 0.8350487720908194

MCC on Blind test: 0.22

Accuracy on Blind test: 0.54

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.582335   0.77262783 0.64726782 0.610358   0.68636823 0.73911905
 0.64701462 0.70605779 0.70633364 0.64915323]

mean value: 0.6746635198593139

key: score_time
value: [0.02002954 0.01183629 0.01188588 0.01181579 0.01440883 0.01112986
 0.01124287 0.01208949 0.01121664 0.01208615]

mean value: 0.012774133682250976

key: test_mcc
value: [0.68884672 0.68884672 0.81409158 0.93933644 0.93933644 0.93933644
 0.87866878 1.         0.87083333 0.87770745]

mean value: 0.8637003892680549

key: train_mcc
value: [1.         0.99298237 0.94375558 0.93720088 0.97192739 0.95129413
 0.96512319 0.93704438 0.9720266  0.9582759 ]

mean value: 0.9629630433458233

key: test_accuracy
value: [0.84375    0.84375    0.90625    0.96875    0.96875    0.96875
 0.93548387 1.         0.93548387 0.93548387]

mean value: 0.9306451612903226

key: train_accuracy
value: [1.         0.99647887 0.97183099 0.96830986 0.98591549 0.97535211
 0.98245614 0.96842105 0.98596491 0.97894737]

mean value: 0.9813676797627873

key: test_fscore
value: [0.83870968 0.83870968 0.90322581 0.96774194 0.96969697 0.96774194
 0.9375     1.         0.9375     0.94117647]

mean value: 0.9302002472543269

key: train_fscore
value: [1.         0.99646643 0.97202797 0.96885813 0.98601399 0.97577855
 0.98269896 0.96885813 0.98601399 0.97916667]

mean value: 0.9815882813444314

key: test_precision
value: [0.86666667 0.86666667 0.93333333 1.         0.94117647 1.
 0.88235294 1.         0.9375     0.88888889]

mean value: 0.9316584967320262

key: train_precision
value: [1.         1.         0.96527778 0.95238095 0.97916667 0.95918367
 0.97260274 0.95890411 0.97916667 0.96575342]

mean value: 0.9732436010934054

key: test_recall
value: [0.8125 0.8125 0.875  0.9375 1.     0.9375 1.     1.     0.9375 1.    ]

mean value: 0.93125

key: train_recall
value: [1.         0.99295775 0.97887324 0.98591549 0.99295775 0.99295775
 0.99300699 0.97902098 0.99295775 0.99295775]

mean value: 0.9901605436816705

key: test_roc_auc
value: [0.84375    0.84375    0.90625    0.96875    0.96875    0.96875
 0.9375     1.         0.93541667 0.93333333]

mean value: 0.930625

key: train_roc_auc
value: [1.         0.99647887 0.97183099 0.96830986 0.98591549 0.97535211
 0.98241899 0.96838373 0.98598936 0.97899636]

mean value: 0.981367576085886

key: test_jcc
value: [0.72222222 0.72222222 0.82352941 0.9375     0.94117647 0.9375
 0.88235294 1.         0.88235294 0.88888889]

mean value: 0.8737745098039216

key: train_jcc
value: [1.         0.99295775 0.94557823 0.93959732 0.97241379 0.9527027
 0.96598639 0.93959732 0.97241379 0.95918367]

mean value: 0.9640430965580683

MCC on Blind test: 0.17

Accuracy on Blind test: 0.43

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.00992465 0.00951338 0.00734925 0.007375   0.00736427 0.00711226
 0.007195   0.00721049 0.00700927 0.00702572]

mean value: 0.007707929611206055

key: score_time
value: [0.0107615  0.00938344 0.00825024 0.0080514  0.00798678 0.00803542
 0.00796485 0.00782204 0.00787354 0.00787258]

mean value: 0.008400177955627442

key: test_mcc
value: [0.625      0.62994079 0.62994079 0.68884672 0.75592895 0.75592895
 0.69203857 0.6125     0.69203857 0.82078268]

mean value: 0.6902946005610465

key: train_mcc
value: [0.74714613 0.73268511 0.71170894 0.74714613 0.71859502 0.73355944
 0.73273302 0.71308876 0.7285593  0.7124563 ]

mean value: 0.7277678155822408

key: test_accuracy
value: [0.8125     0.8125     0.8125     0.84375    0.875      0.875
 0.83870968 0.80645161 0.83870968 0.90322581]

mean value: 0.8418346774193548

key: train_accuracy
value: [0.87323944 0.86619718 0.8556338  0.87323944 0.85915493 0.86619718
 0.86315789 0.85614035 0.86315789 0.85614035]

mean value: 0.8632258463059056

key: test_fscore
value: [0.8125     0.82352941 0.82352941 0.84848485 0.88235294 0.88235294
 0.84848485 0.8        0.82758621 0.91428571]

mean value: 0.8463106324034316

key: train_fscore
value: [0.87586207 0.86805556 0.85813149 0.87586207 0.86111111 0.86986301
 0.87213115 0.86006826 0.86779661 0.85714286]

mean value: 0.8666024180424603

key: test_precision
value: [0.8125     0.77777778 0.77777778 0.82352941 0.83333333 0.83333333
 0.77777778 0.8        0.92307692 0.84210526]

mean value: 0.8201211597999524

key: train_precision
value: [0.85810811 0.85616438 0.84353741 0.85810811 0.84931507 0.84666667
 0.82098765 0.84       0.83660131 0.84827586]

mean value: 0.845776457348316

key: test_recall
value: [0.8125     0.875      0.875      0.875      0.9375     0.9375
 0.93333333 0.8        0.75       1.        ]

mean value: 0.8795833333333334

key: train_recall
value: [0.8943662  0.88028169 0.87323944 0.8943662  0.87323944 0.8943662
 0.93006993 0.88111888 0.90140845 0.86619718]

mean value: 0.8888653599921206

key: test_roc_auc
value: [0.8125     0.8125     0.8125     0.84375    0.875      0.875
 0.84166667 0.80625    0.84166667 0.9       ]

mean value: 0.8420833333333333

key: train_roc_auc
value: [0.87323944 0.86619718 0.8556338  0.87323944 0.85915493 0.86619718
 0.86292229 0.8560524  0.86329164 0.85617551]

mean value: 0.8632103811681276

key: test_jcc
value: [0.68421053 0.7        0.7        0.73684211 0.78947368 0.78947368
 0.73684211 0.66666667 0.70588235 0.84210526]

mean value: 0.7351496388028895

key: train_jcc
value: [0.7791411  0.76687117 0.75151515 0.7791411  0.75609756 0.76969697
 0.77325581 0.75449102 0.76646707 0.75      ]

mean value: 0.7646676954206684

MCC on Blind test: 0.22

Accuracy on Blind test: 0.59

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.0074749  0.00728703 0.00717139 0.00743604 0.00727081 0.00725842
 0.00724769 0.00720358 0.00726557 0.00720382]

mean value: 0.0072819232940673825

key: score_time
value: [0.00798202 0.00782013 0.00794554 0.00785279 0.00791287 0.00786495
 0.0078876  0.00792551 0.00783515 0.00804877]

mean value: 0.007907533645629882

key: test_mcc
value: [0.68884672 0.56360186 0.68884672 0.625      0.438357   0.68884672
 0.48954403 0.48333333 0.55573827 0.55573827]

mean value: 0.5777852941864914

key: train_mcc
value: [0.64814452 0.64814452 0.6479516  0.63405443 0.65572679 0.62714946
 0.62393794 0.65616074 0.64212548 0.6494089 ]

mean value: 0.6432804381067745

key: test_accuracy
value: [0.84375    0.78125    0.84375    0.8125     0.71875    0.84375
 0.74193548 0.74193548 0.77419355 0.77419355]

mean value: 0.7876008064516129

key: train_accuracy
value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028
 0.81052632 0.82807018 0.82105263 0.8245614 ]

mean value: 0.8213787991104522

key: test_fscore
value: [0.83870968 0.77419355 0.84848485 0.8125     0.70967742 0.84848485
 0.75       0.73333333 0.8        0.8       ]

mean value: 0.791538367546432

key: train_fscore
value: [0.82638889 0.82638889 0.82269504 0.81944444 0.83161512 0.816609
 0.82       0.82807018 0.82105263 0.82638889]

mean value: 0.8238653070404355

key: test_precision
value: [0.86666667 0.8        0.82352941 0.8125     0.73333333 0.82352941
 0.70588235 0.73333333 0.73684211 0.73684211]

mean value: 0.7772458720330238

key: train_precision
value: [0.81506849 0.81506849 0.82857143 0.80821918 0.81208054 0.80272109
 0.78343949 0.83098592 0.81818182 0.81506849]

mean value: 0.8129404935574437

key: test_recall
value: [0.8125     0.75       0.875      0.8125     0.6875     0.875
 0.8        0.73333333 0.875      0.875     ]

mean value: 0.8095833333333333

key: train_recall
value: [0.83802817 0.83802817 0.81690141 0.83098592 0.85211268 0.83098592
 0.86013986 0.82517483 0.82394366 0.83802817]

mean value: 0.8354328769821727

key: test_roc_auc
value: [0.84375    0.78125    0.84375    0.8125     0.71875    0.84375
 0.74375    0.74166667 0.77083333 0.77083333]

mean value: 0.7870833333333334

key: train_roc_auc
value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028
 0.81035162 0.82808037 0.82106274 0.82460849]

mean value: 0.8213680685511672

key: test_jcc
value: [0.72222222 0.63157895 0.73684211 0.68421053 0.55       0.73684211
 0.6        0.57894737 0.66666667 0.66666667]

mean value: 0.6573976608187134

key: train_jcc
value: [0.70414201 0.70414201 0.69879518 0.69411765 0.71176471 0.69005848
 0.69491525 0.70658683 0.69642857 0.70414201]

mean value: 0.7005092700712355

MCC on Blind test: 0.19

Accuracy on Blind test: 0.54

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00720549 0.00690293 0.00749421 0.0067699  0.00747395 0.00748181
 0.00723648 0.00759244 0.0074594  0.00673008]

mean value: 0.0072346687316894535

key: score_time
value: [0.01040697 0.01126409 0.01092076 0.01008987 0.01062059 0.01053739
 0.01394534 0.01144624 0.01064205 0.01186824]

mean value: 0.011174154281616212

key: test_mcc
value: [0.62994079 0.31311215 0.56360186 0.56360186 0.31814238 0.82717019
 0.82285074 0.67916667 0.57461167 0.68826048]

mean value: 0.5980458781591939

key: train_mcc
value: [0.71838112 0.74655293 0.7253701  0.74655293 0.71142639 0.68311553
 0.69826652 0.67718901 0.70556653 0.67774254]

mean value: 0.7090163590202463

key: test_accuracy
value: [0.8125     0.65625    0.78125    0.78125    0.65625    0.90625
 0.90322581 0.83870968 0.77419355 0.83870968]

mean value: 0.794858870967742

key: train_accuracy
value: [0.85915493 0.87323944 0.86267606 0.87323944 0.8556338  0.8415493
 0.84912281 0.83859649 0.85263158 0.83859649]

mean value: 0.8544440326167532

key: test_fscore
value: [0.8        0.66666667 0.78787879 0.77419355 0.62068966 0.91428571
 0.90909091 0.83870968 0.81081081 0.85714286]

mean value: 0.7979468626854611

key: train_fscore
value: [0.85815603 0.87412587 0.86315789 0.87412587 0.85714286 0.8409894
 0.84912281 0.83916084 0.85416667 0.83453237]

mean value: 0.8544680614739297

key: test_precision
value: [0.85714286 0.64705882 0.76470588 0.8        0.69230769 0.84210526
 0.83333333 0.8125     0.71428571 0.78947368]

mean value: 0.7752913250320371

key: train_precision
value: [0.86428571 0.86805556 0.86013986 0.86805556 0.84827586 0.84397163
 0.85211268 0.83916084 0.84246575 0.85294118]

mean value: 0.8539464623923748

key: test_recall
value: [0.75       0.6875     0.8125     0.75       0.5625     1.
 1.         0.86666667 0.9375     0.9375    ]

mean value: 0.8304166666666667

key: train_recall
value: [0.85211268 0.88028169 0.86619718 0.88028169 0.86619718 0.83802817
 0.84615385 0.83916084 0.86619718 0.81690141]

mean value: 0.8551511868413277

key: test_roc_auc
value: [0.8125     0.65625    0.78125    0.78125    0.65625    0.90625
 0.90625    0.83958333 0.76875    0.83541667]

mean value: 0.794375

key: train_roc_auc
value: [0.85915493 0.87323944 0.86267606 0.87323944 0.8556338  0.8415493
 0.84913326 0.8385945  0.85267901 0.83852063]

mean value: 0.854442036836403

key: test_jcc
value: [0.66666667 0.5        0.65       0.63157895 0.45       0.84210526
 0.83333333 0.72222222 0.68181818 0.75      ]

mean value: 0.672772461456672

key: train_jcc
value: [0.7515528  0.77639752 0.75925926 0.77639752 0.75       0.72560976
 0.73780488 0.72289157 0.74545455 0.71604938]

mean value: 0.7461417213928212

MCC on Blind test: 0.18

Accuracy on Blind test: 0.56

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01138067 0.01109099 0.01127625 0.01128221 0.01073098 0.0110662
 0.01142335 0.01140451 0.01070118 0.01133704]

mean value: 0.011169338226318359

key: score_time
value: [0.00933075 0.00924778 0.00916719 0.00923133 0.00920081 0.00921702
 0.00934625 0.00917053 0.00837874 0.0092051 ]

mean value: 0.009149551391601562

key: test_mcc
value: [0.625      0.50395263 0.57265629 0.64549722 0.81409158 0.77459667
 0.76948376 0.80833333 0.6310315  0.76594169]

mean value: 0.6910584675818924

key: train_mcc
value: [0.7618988  0.7476577  0.76035829 0.75897979 0.73060671 0.72554232
 0.7375982  0.72956319 0.72987459 0.71397006]

mean value: 0.7396049640194965

key: test_accuracy
value: [0.8125     0.75       0.78125    0.8125     0.90625    0.875
 0.87096774 0.90322581 0.80645161 0.87096774]

mean value: 0.8389112903225806

key: train_accuracy
value: [0.87676056 0.86971831 0.87676056 0.87676056 0.86267606 0.85915493
 0.86315789 0.85964912 0.85964912 0.85263158]

mean value: 0.8656918705213739

key: test_fscore
value: [0.8125     0.76470588 0.8        0.83333333 0.90909091 0.88888889
 0.88235294 0.90322581 0.83333333 0.88888889]

mean value: 0.8516319983516378

key: train_fscore
value: [0.8852459  0.87868852 0.88448845 0.88372093 0.87043189 0.86842105
 0.87459807 0.87096774 0.87012987 0.8627451 ]

mean value: 0.8749437532470357

key: test_precision
value: [0.8125     0.72222222 0.73684211 0.75       0.88235294 0.8
 0.78947368 0.875      0.75       0.8       ]

mean value: 0.7918390952872377

key: train_precision
value: [0.82822086 0.82208589 0.83229814 0.83647799 0.82389937 0.81481481
 0.80952381 0.80838323 0.80722892 0.80487805]

mean value: 0.8187811065917483

key: test_recall
value: [0.8125     0.8125     0.875      0.9375     0.9375     1.
 1.         0.93333333 0.9375     1.        ]

mean value: 0.9245833333333333

key: train_recall
value: [0.95070423 0.94366197 0.94366197 0.93661972 0.92253521 0.92957746
 0.95104895 0.94405594 0.94366197 0.92957746]

mean value: 0.9395104895104895

key: test_roc_auc
value: [0.8125     0.75       0.78125    0.8125     0.90625    0.875
 0.875      0.90416667 0.80208333 0.86666667]

mean value: 0.8385416666666666

key: train_roc_auc
value: [0.87676056 0.86971831 0.87676056 0.87676056 0.86267606 0.85915493
 0.86284842 0.85935192 0.85994287 0.85290062]

mean value: 0.865687481532552

key: test_jcc
value: [0.68421053 0.61904762 0.66666667 0.71428571 0.83333333 0.8
 0.78947368 0.82352941 0.71428571 0.8       ]

mean value: 0.744483266991007

key: train_jcc
value: [0.79411765 0.78362573 0.79289941 0.79166667 0.77058824 0.76744186
 0.77714286 0.77142857 0.77011494 0.75862069]

mean value: 0.7777646609518236

MCC on Blind test: 0.23

Accuracy on Blind test: 0.48

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [0.94244218 1.01735759 0.8850472  1.05389714 0.87144995 0.99552441
 0.90028095 0.87230182 1.03594947 0.87329125]

mean value: 0.9447541952133178

key: score_time
value: [0.01177907 0.013484   0.01336789 0.01362443 0.01371074 0.01331043
 0.01345372 0.01344275 0.01364231 0.01379061]

mean value: 0.013360595703125

key: test_mcc
value: [0.68884672 0.68884672 0.69991324 0.875      0.8819171  0.875
 0.80833333 0.9375     0.74166667 0.82078268]

mean value: 0.8017806465017004

key: train_mcc
value: [1.         0.99298237 0.99298237 0.99298237 0.98591549 0.99298237
 0.98596474 0.9789707  0.99300699 0.99300665]

mean value: 0.9908794051074042

key: test_accuracy
value: [0.84375    0.84375    0.84375    0.9375     0.9375     0.9375
 0.90322581 0.96774194 0.87096774 0.90322581]

mean value: 0.8988911290322581

key: train_accuracy
value: [1.         0.99647887 0.99647887 0.99647887 0.99295775 0.99647887
 0.99298246 0.98947368 0.99649123 0.99649123]

mean value: 0.9954311835927848

key: test_fscore
value: [0.84848485 0.83870968 0.85714286 0.9375     0.94117647 0.9375
 0.90322581 0.96774194 0.875      0.91428571]

mean value: 0.9020767309856494

key: train_fscore
value: [1.         0.99646643 0.99646643 0.99646643 0.99295775 0.99646643
 0.99300699 0.98954704 0.99649123 0.99646643]

mean value: 0.99543351613606

key: test_precision
value: [0.82352941 0.86666667 0.78947368 0.9375     0.88888889 0.9375
 0.875      0.9375     0.875      0.84210526]

mean value: 0.8773163914688682

key: train_precision
value: [1.         1.         1.         1.         0.99295775 1.
 0.99300699 0.98611111 0.99300699 1.        ]

mean value: 0.9965082843603971

key: test_recall
value: [0.875      0.8125     0.9375     0.9375     1.         0.9375
 0.93333333 1.         0.875      1.        ]

mean value: 0.9308333333333333

key: train_recall
value: [1.         0.99295775 0.99295775 0.99295775 0.99295775 0.99295775
 0.99300699 0.99300699 1.         0.99295775]

mean value: 0.9943760464887226

key: test_roc_auc
value: [0.84375    0.84375    0.84375    0.9375     0.9375     0.9375
 0.90416667 0.96875    0.87083333 0.9       ]

mean value: 0.89875

key: train_roc_auc
value: [1.         0.99647887 0.99647887 0.99647887 0.99295775 0.99647887
 0.99298237 0.98946124 0.9965035  0.99647887]

mean value: 0.9954299221904855

key: test_jcc
value: [0.73684211 0.72222222 0.75       0.88235294 0.88888889 0.88235294
 0.82352941 0.9375     0.77777778 0.84210526]

mean value: 0.8243571551427589

key: train_jcc
value: [1.         0.99295775 0.99295775 0.99295775 0.98601399 0.99295775
 0.98611111 0.97931034 0.99300699 0.99295775]

mean value: 0.9909231167354042

MCC on Blind test: 0.18

Accuracy on Blind test: 0.45

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01128316 0.01111746 0.00982285 0.009413   0.00901008 0.00897694
 0.00881934 0.00902772 0.00959873 0.00934815]

mean value: 0.009641742706298828

key: score_time
value: [0.01056886 0.00906277 0.00891089 0.00860476 0.0085628  0.00862837
 0.00838184 0.00824451 0.00853562 0.00857925]

mean value: 0.008807969093322755

key: test_mcc
value: [0.81409158 0.68884672 0.875      1.         0.8819171  0.93933644
 0.9375     1.         0.80833333 0.80753845]

mean value: 0.8752563621702886

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.90625    0.84375    0.9375     1.         0.9375     0.96875
 0.96774194 1.         0.90322581 0.90322581]

mean value: 0.9367943548387097

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.83870968 0.9375     1.         0.94117647 0.96774194
 0.96774194 1.         0.90322581 0.90909091]

mean value: 0.9374277643608763

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88235294 0.86666667 0.9375     1.         0.88888889 1.
 0.9375     1.         0.93333333 0.88235294]

mean value: 0.932859477124183

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9375 0.8125 0.9375 1.     1.     0.9375 1.     1.     0.875  0.9375]

mean value: 0.94375

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90625    0.84375    0.9375     1.         0.9375     0.96875
 0.96875    1.         0.90416667 0.90208333]

mean value: 0.936875

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.72222222 0.88235294 1.         0.88888889 0.9375
 0.9375     1.         0.82352941 0.83333333]

mean value: 0.8858660130718954

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.02

Accuracy on Blind test: 0.22

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.09702897 0.09689069 0.096277   0.09499073 0.09550571 0.09818435
 0.09888387 0.09795642 0.09792018 0.09375334]

mean value: 0.09673912525177002

key: score_time
value: [0.01839042 0.01852298 0.0182128  0.01794076 0.01818895 0.01855779
 0.01845098 0.01811409 0.01863813 0.01832128]

mean value: 0.018333816528320314

key: test_mcc
value: [0.68884672 0.68884672 0.68884672 0.62994079 0.81409158 0.93933644
 0.9375     1.         0.87083333 0.87770745]

mean value: 0.8135949748773968

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.84375    0.84375    0.84375    0.8125     0.90625    0.96875
 0.96774194 1.         0.93548387 0.93548387]

mean value: 0.9057459677419355

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.84848485 0.84848485 0.84848485 0.82352941 0.90909091 0.96969697
 0.96774194 1.         0.9375     0.94117647]

mean value: 0.9094190242079236

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.82352941 0.82352941 0.82352941 0.77777778 0.88235294 0.94117647
 0.9375     1.         0.9375     0.88888889]

mean value: 0.883578431372549

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.875  0.875  0.875  0.875  0.9375 1.     1.     1.     0.9375 1.    ]

mean value: 0.9375

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.84375    0.84375    0.84375    0.8125     0.90625    0.96875
 0.96875    1.         0.93541667 0.93333333]

mean value: 0.905625

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.73684211 0.73684211 0.73684211 0.7        0.83333333 0.94117647
 0.9375     1.         0.88235294 0.88888889]

mean value: 0.8393777949776402

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.23

Accuracy on Blind test: 0.45

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00812912 0.00792956 0.00760221 0.00785184 0.00799227 0.0079267
 0.00804663 0.00806618 0.00824046 0.00799036]

mean value: 0.007977533340454101

key: score_time
value: [0.00857091 0.00843978 0.00853491 0.00855112 0.0085063  0.00849175
 0.00858927 0.00863695 0.00858855 0.00860786]

mean value: 0.008551740646362304

key: test_mcc
value: [0.5        0.69991324 0.50395263 0.77459667 0.82717019 0.82717019
 0.74689528 0.82078268 0.35983579 0.6125    ]

mean value: 0.6672816673588119

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.75       0.84375    0.75       0.875      0.90625    0.90625
 0.87096774 0.90322581 0.67741935 0.80645161]

mean value: 0.8289314516129032

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.75       0.85714286 0.73333333 0.88888889 0.91428571 0.89655172
 0.85714286 0.88888889 0.66666667 0.8125    ]

mean value: 0.8265400930487138

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.75       0.78947368 0.78571429 0.8        0.84210526 1.
 0.92307692 1.         0.71428571 0.8125    ]

mean value: 0.8417155870445344

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.75   0.9375 0.6875 1.     1.     0.8125 0.8    0.8    0.625  0.8125]

mean value: 0.8225

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.75       0.84375    0.75       0.875      0.90625    0.90625
 0.86875    0.9        0.67916667 0.80625   ]

mean value: 0.8285416666666667

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.6        0.75       0.57894737 0.8        0.84210526 0.8125
 0.75       0.8        0.5        0.68421053]

mean value: 0.7117763157894736

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.12

Accuracy on Blind test: 0.49

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.19962478 1.20865035 1.21551681 1.2219255  1.21645331 1.22310948
 1.23031688 1.22133994 1.21362448 1.22138143]

mean value: 1.2171942949295045

key: score_time
value: [0.15371752 0.09660053 0.09662104 0.09716916 0.09705114 0.09664798
 0.09718585 0.09721947 0.09733677 0.09707975]

mean value: 0.10266292095184326

key: test_mcc
value: [0.81409158 0.875      0.875      0.8819171  0.8819171  1.
 0.9375     1.         1.         0.9372467 ]

mean value: 0.9202672483593498

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.90625    0.9375     0.9375     0.9375     0.9375     1.
 0.96774194 1.         1.         0.96774194]

mean value: 0.9591733870967742

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.9375     0.9375     0.94117647 0.94117647 1.
 0.96774194 1.         1.         0.96969697]

mean value: 0.9603882755448221

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88235294 0.9375     0.9375     0.88888889 0.88888889 1.
 0.9375     1.         1.         0.94117647]

mean value: 0.9413807189542484

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9375 0.9375 0.9375 1.     1.     1.     1.     1.     1.     1.    ]

mean value: 0.98125

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90625    0.9375     0.9375     0.9375     0.9375     1.
 0.96875    1.         1.         0.96666667]

mean value: 0.9591666666666667

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.88235294 0.88235294 0.88888889 0.88888889 1.
 0.9375     1.         1.         0.94117647]

mean value: 0.9254493464052287

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.21

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.87549925 0.98825598 0.90560436 0.87923479 0.87618375 0.90693331
 0.90976977 0.86167288 0.91743851 0.89844847]

mean value: 0.9019041061401367

key: score_time
value: [0.26297545 0.16814804 0.23696327 0.21888471 0.23352385 0.23716521
 0.25041318 0.24322152 0.20568323 0.21444941]

mean value: 0.2271427869796753

key: test_mcc
value: [0.68884672 0.875      0.81409158 0.8819171  0.8819171  0.93933644
 0.9375     1.         1.         0.9372467 ]

mean value: 0.8955855640414887

key: train_mcc
value: [0.96500412 0.95812669 0.95091647 0.93775982 0.94403659 0.94403659
 0.95108379 0.94422558 0.94423649 0.94423649]

mean value: 0.9483662624447999

key: test_accuracy
value: [0.84375    0.9375     0.90625    0.9375     0.9375     0.96875
 0.96774194 1.         1.         0.96774194]

mean value: 0.9466733870967742

key: train_accuracy
value: [0.98239437 0.97887324 0.97535211 0.96830986 0.97183099 0.97183099
 0.9754386  0.97192982 0.97192982 0.97192982]

mean value: 0.9739819619471214

key: test_fscore
value: [0.84848485 0.9375     0.90909091 0.94117647 0.94117647 0.96969697
 0.96774194 1.         1.         0.96969697]

mean value: 0.9484564573630039

key: train_fscore
value: [0.9825784  0.97916667 0.97560976 0.96907216 0.97222222 0.97222222
 0.97577855 0.97241379 0.97222222 0.97222222]

mean value: 0.9743508213630365

key: test_precision
value: [0.82352941 0.9375     0.88235294 0.88888889 0.88888889 0.94117647
 0.9375     1.         1.         0.94117647]

mean value: 0.9241013071895424

key: train_precision
value: [0.97241379 0.96575342 0.96551724 0.94630872 0.95890411 0.95890411
 0.96575342 0.95918367 0.95890411 0.95890411]

mean value: 0.9610546720455594

key: test_recall
value: [0.875  0.9375 0.9375 1.     1.     1.     1.     1.     1.     1.    ]

mean value: 0.975

key: train_recall
value: [0.99295775 0.99295775 0.98591549 0.99295775 0.98591549 0.98591549
 0.98601399 0.98601399 0.98591549 0.98591549]

mean value: 0.9880478676253325

key: test_roc_auc
value: [0.84375    0.9375     0.90625    0.9375     0.9375     0.96875
 0.96875    1.         1.         0.96666667]

mean value: 0.9466666666666667

key: train_roc_auc
value: [0.98239437 0.97887324 0.97535211 0.96830986 0.97183099 0.97183099
 0.97540136 0.97188023 0.97197873 0.97197873]

mean value: 0.9739830591943268

key: test_jcc
value: [0.73684211 0.88235294 0.83333333 0.88888889 0.88888889 0.94117647
 0.9375     1.         1.         0.94117647]

mean value: 0.905015909872721

key: train_jcc
value: [0.96575342 0.95918367 0.95238095 0.94       0.94594595 0.94594595
 0.9527027  0.94630872 0.94594595 0.94594595]

mean value: 0.9500113261826575

MCC on Blind test: 0.15

Accuracy on Blind test: 0.3

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01972842 0.00707984 0.00703955 0.00708413 0.00708127 0.00711823
 0.0071497  0.00713921 0.00711536 0.00711989]

mean value: 0.008365559577941894

key: score_time
value: [0.00945282 0.00773811 0.00782251 0.00773239 0.00773835 0.00778341
 0.00787854 0.00775337 0.00779319 0.00774002]

mean value: 0.007943272590637207

key: test_mcc
value: [0.68884672 0.56360186 0.68884672 0.625      0.438357   0.68884672
 0.48954403 0.48333333 0.55573827 0.55573827]

mean value: 0.5777852941864914

key: train_mcc
value: [0.64814452 0.64814452 0.6479516  0.63405443 0.65572679 0.62714946
 0.62393794 0.65616074 0.64212548 0.6494089 ]

mean value: 0.6432804381067745

key: test_accuracy
value: [0.84375    0.78125    0.84375    0.8125     0.71875    0.84375
 0.74193548 0.74193548 0.77419355 0.77419355]

mean value: 0.7876008064516129

key: train_accuracy
value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028
 0.81052632 0.82807018 0.82105263 0.8245614 ]

mean value: 0.8213787991104522

key: test_fscore
value: [0.83870968 0.77419355 0.84848485 0.8125     0.70967742 0.84848485
 0.75       0.73333333 0.8        0.8       ]

mean value: 0.791538367546432

key: train_fscore
value: [0.82638889 0.82638889 0.82269504 0.81944444 0.83161512 0.816609
 0.82       0.82807018 0.82105263 0.82638889]

mean value: 0.8238653070404355

key: test_precision
value: [0.86666667 0.8        0.82352941 0.8125     0.73333333 0.82352941
 0.70588235 0.73333333 0.73684211 0.73684211]

mean value: 0.7772458720330238

key: train_precision
value: [0.81506849 0.81506849 0.82857143 0.80821918 0.81208054 0.80272109
 0.78343949 0.83098592 0.81818182 0.81506849]

mean value: 0.8129404935574437

key: test_recall
value: [0.8125     0.75       0.875      0.8125     0.6875     0.875
 0.8        0.73333333 0.875      0.875     ]

mean value: 0.8095833333333333

key: train_recall
value: [0.83802817 0.83802817 0.81690141 0.83098592 0.85211268 0.83098592
 0.86013986 0.82517483 0.82394366 0.83802817]

mean value: 0.8354328769821727

key: test_roc_auc
value: [0.84375    0.78125    0.84375    0.8125     0.71875    0.84375
 0.74375    0.74166667 0.77083333 0.77083333]

mean value: 0.7870833333333334

key: train_roc_auc
value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028
 0.81035162 0.82808037 0.82106274 0.82460849]

mean value: 0.8213680685511672

key: test_jcc
value: [0.72222222 0.63157895 0.73684211 0.68421053 0.55       0.73684211
 0.6        0.57894737 0.66666667 0.66666667]

mean value: 0.6573976608187134

key: train_jcc
value: [0.70414201 0.70414201 0.69879518 0.69411765 0.71176471 0.69005848
 0.69491525 0.70658683 0.69642857 0.70414201]

mean value: 0.7005092700712355

MCC on Blind test: 0.19

Accuracy on Blind test: 0.54

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.10866404 0.04417276 0.08032727 0.0377512  0.03836942 0.03934383
 0.0412488  0.73144245 0.03698397 0.03865409]

mean value: 0.11969578266143799

key: score_time
value: [0.0095489  0.00957394 0.00984144 0.00939536 0.0093596  0.00946164
 0.00942516 0.00999594 0.01063395 0.00950313]

mean value: 0.00967390537261963

key: test_mcc
value: [0.81409158 0.81409158 0.875      0.93933644 0.8819171  1.
 0.9375     1.         0.9375     0.87770745]

mean value: 0.9077144148609821

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.90625    0.90625    0.9375     0.96875    0.9375     1.
 0.96774194 1.         0.96774194 0.93548387]

mean value: 0.9527217741935484

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.90322581 0.9375     0.96969697 0.94117647 1.
 0.96774194 1.         0.96774194 0.94117647]

mean value: 0.9537350497383704

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88235294 0.93333333 0.9375     0.94117647 0.88888889 1.
 0.9375     1.         1.         0.88888889]

mean value: 0.9409640522875817

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9375 0.875  0.9375 1.     1.     1.     1.     1.     0.9375 1.    ]

mean value: 0.96875

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90625    0.90625    0.9375     0.96875    0.9375     1.
 0.96875    1.         0.96875    0.93333333]

mean value: 0.9527083333333334

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.82352941 0.88235294 0.94117647 0.88888889 1.
 0.9375     1.         0.9375     0.88888889]

mean value: 0.9133169934640523

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.2

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01153302 0.0145793  0.014395   0.0144124  0.01459241 0.0143168
 0.01439118 0.01458287 0.01461196 0.0144515 ]

mean value: 0.014186644554138183

key: score_time
value: [0.01013279 0.01050425 0.0104897  0.0105176  0.01051116 0.01054525
 0.01043344 0.01050496 0.01060581 0.01054263]

mean value: 0.010478758811950683

key: test_mcc
value: [0.81409158 0.81409158 0.93933644 1.         0.8819171  1.
 0.87083333 1.         1.         0.9372467 ]

mean value: 0.9257516728277053

key: train_mcc
value: [0.95812669 0.95812669 0.94403659 0.93720088 0.94403659 0.93720088
 0.95108379 0.95145657 0.94470481 0.9582759 ]

mean value: 0.948424939171215

key: test_accuracy
value: [0.90625    0.90625    0.96875    1.         0.9375     1.
 0.93548387 1.         1.         0.96774194]

mean value: 0.9621975806451613

key: train_accuracy
value: [0.97887324 0.97887324 0.97183099 0.96830986 0.97183099 0.96830986
 0.9754386  0.9754386  0.97192982 0.97894737]

mean value: 0.9739782554978997

key: test_fscore
value: [0.90909091 0.90322581 0.96969697 1.         0.94117647 1.
 0.93333333 1.         1.         0.96969697]

mean value: 0.962622045885803

key: train_fscore
value: [0.97916667 0.97916667 0.97222222 0.96885813 0.97222222 0.96885813
 0.97577855 0.97594502 0.97241379 0.97916667]

mean value: 0.9743798064418605

key: test_precision
value: [0.88235294 0.93333333 0.94117647 1.         0.88888889 1.
 0.93333333 1.         1.         0.94117647]

mean value: 0.9520261437908497

key: train_precision
value: [0.96575342 0.96575342 0.95890411 0.95238095 0.95890411 0.95238095
 0.96575342 0.95945946 0.9527027  0.96575342]

mean value: 0.9597745984732285

key: test_recall
value: [0.9375     0.875      1.         1.         1.         1.
 0.93333333 1.         1.         1.        ]

mean value: 0.9745833333333334

key: train_recall
value: [0.99295775 0.99295775 0.98591549 0.98591549 0.98591549 0.98591549
 0.98601399 0.99300699 0.99295775 0.99295775]

mean value: 0.9894513936767458

key: test_roc_auc
value: [0.90625    0.90625    0.96875    1.         0.9375     1.
 0.93541667 1.         1.         0.96666667]

mean value: 0.9620833333333333

key: train_roc_auc
value: [0.97887324 0.97887324 0.97183099 0.96830986 0.97183099 0.96830986
 0.97540136 0.97537674 0.97200335 0.97899636]

mean value: 0.9739805968679208

key: test_jcc
value: [0.83333333 0.82352941 0.94117647 1.         0.88888889 1.
 0.875      1.         1.         0.94117647]

mean value: 0.9303104575163399

key: train_jcc
value: [0.95918367 0.95918367 0.94594595 0.93959732 0.94594595 0.93959732
 0.9527027  0.95302013 0.94630872 0.95918367]

mean value: 0.9500669104935644

MCC on Blind test: 0.16

Accuracy on Blind test: 0.36

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.00940704 0.00748181 0.00721669 0.00730562 0.00781703 0.00796103
 0.0077672  0.00788808 0.00784159 0.00789857]

mean value: 0.007858467102050782

key: score_time
value: [0.00908256 0.00800729 0.00791621 0.00769114 0.00820541 0.00846505
 0.00846457 0.00859737 0.00852084 0.00845647]

mean value: 0.008340692520141602

key: test_mcc
value: [0.62994079 0.50395263 0.62994079 0.68884672 0.62994079 0.75592895
 0.67916667 0.61925228 0.74689528 0.66057826]

mean value: 0.6544443153383147

key: train_mcc
value: [0.67386056 0.69575325 0.68038921 0.67508446 0.67277821 0.66621443
 0.66189073 0.68037155 0.67635913 0.66649204]

mean value: 0.67491935676675

key: test_accuracy
value: [0.8125     0.75       0.8125     0.84375    0.8125     0.875
 0.83870968 0.80645161 0.87096774 0.80645161]

mean value: 0.822883064516129

key: train_accuracy
value: [0.83450704 0.84507042 0.83802817 0.83450704 0.83450704 0.83098592
 0.82807018 0.83859649 0.83508772 0.83157895]

mean value: 0.835093896713615

key: test_fscore
value: [0.8        0.76470588 0.82352941 0.84848485 0.82352941 0.88235294
 0.83870968 0.8125     0.88235294 0.84210526]

mean value: 0.8318270377297392

key: train_fscore
value: [0.84385382 0.85430464 0.84666667 0.84488449 0.84280936 0.84
 0.83934426 0.84666667 0.84488449 0.83892617]

mean value: 0.844234056793084

key: test_precision
value: [0.85714286 0.72222222 0.77777778 0.82352941 0.77777778 0.83333333
 0.8125     0.76470588 0.83333333 0.72727273]

mean value: 0.7929595322977676

key: train_precision
value: [0.79874214 0.80625    0.80379747 0.79503106 0.80254777 0.79746835
 0.79012346 0.8089172  0.79503106 0.80128205]

mean value: 0.7999190549175873

key: test_recall
value: [0.75       0.8125     0.875      0.875      0.875      0.9375
 0.86666667 0.86666667 0.9375     1.        ]

mean value: 0.8795833333333334

key: train_recall
value: [0.8943662  0.9084507  0.8943662  0.90140845 0.88732394 0.88732394
 0.8951049  0.88811189 0.90140845 0.88028169]

mean value: 0.8938146360681573

key: test_roc_auc
value: [0.8125     0.75       0.8125     0.84375    0.8125     0.875
 0.83958333 0.80833333 0.86875    0.8       ]

mean value: 0.8222916666666666

key: train_roc_auc
value: [0.83450704 0.84507042 0.83802817 0.83450704 0.83450704 0.83098592
 0.82783414 0.83842214 0.83531961 0.83174924]

mean value: 0.8350930759381464

key: test_jcc
value: [0.66666667 0.61904762 0.7        0.73684211 0.7        0.78947368
 0.72222222 0.68421053 0.78947368 0.72727273]

mean value: 0.7135209235209236

key: train_jcc
value: [0.72988506 0.74566474 0.73410405 0.73142857 0.7283237  0.72413793
 0.72316384 0.73410405 0.73142857 0.72254335]

mean value: 0.7304783857563864

MCC on Blind test: 0.22

Accuracy on Blind test: 0.54

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.00985122 0.01013541 0.01139951 0.01236129 0.01133943 0.01101375
 0.01197648 0.01175308 0.01180387 0.0115931 ]

mean value: 0.011322712898254395

key: score_time
value: [0.00835967 0.01045895 0.01057839 0.01042914 0.01039219 0.01044703
 0.01043487 0.01063824 0.01045942 0.01041865]

mean value: 0.0102616548538208

key: test_mcc
value: [0.75592895 0.68884672 0.8819171  0.67419986 0.8819171  0.81409158
 0.87866878 0.9375     0.87083333 0.9372467 ]

mean value: 0.8321150124795701

key: train_mcc
value: [0.97183099 0.92966968 0.93775982 0.8661418  0.92365817 0.90901439
 0.95798651 0.9114673  0.78397114 0.94395469]

mean value: 0.9135454491091561

key: test_accuracy
value: [0.875      0.84375    0.9375     0.8125     0.9375     0.90625
 0.93548387 0.96774194 0.93548387 0.96774194]

mean value: 0.9118951612903226

key: train_accuracy
value: [0.98591549 0.96478873 0.96830986 0.92957746 0.96126761 0.95422535
 0.97894737 0.95438596 0.89122807 0.97192982]

mean value: 0.9560575735112429

key: test_fscore
value: [0.88235294 0.83870968 0.94117647 0.76923077 0.94117647 0.90909091
 0.9375     0.96774194 0.9375     0.96969697]

mean value: 0.9094176143274815

key: train_fscore
value: [0.98591549 0.96503497 0.96907216 0.92481203 0.96219931 0.9550173
 0.97916667 0.95622896 0.88727273 0.97202797]

mean value: 0.9556747588965514

key: test_precision
value: [0.83333333 0.86666667 0.88888889 1.         0.88888889 0.88235294
 0.88235294 0.9375     0.9375     0.94117647]

mean value: 0.9058660130718954

key: train_precision
value: [0.98591549 0.95833333 0.94630872 0.99193548 0.93959732 0.93877551
 0.97241379 0.92207792 0.91729323 0.96527778]

mean value: 0.9537928586676441

key: test_recall
value: [0.9375 0.8125 1.     0.625  1.     0.9375 1.     1.     0.9375 1.    ]

mean value: 0.925

key: train_recall
value: [0.98591549 0.97183099 0.99295775 0.86619718 0.98591549 0.97183099
 0.98601399 0.99300699 0.85915493 0.97887324]

mean value: 0.9591697035359007

key: test_roc_auc
value: [0.875      0.84375    0.9375     0.8125     0.9375     0.90625
 0.9375     0.96875    0.93541667 0.96666667]

mean value: 0.9120833333333334

key: train_roc_auc
value: [0.98591549 0.96478873 0.96830986 0.92957746 0.96126761 0.95422535
 0.97892249 0.95424998 0.89111593 0.9719541 ]

mean value: 0.9560326996946715

key: test_jcc
value: [0.78947368 0.72222222 0.88888889 0.625      0.88888889 0.83333333
 0.88235294 0.9375     0.88235294 0.94117647]

mean value: 0.8391189370485036

key: train_jcc
value: [0.97222222 0.93243243 0.94       0.86013986 0.92715232 0.91390728
 0.95918367 0.91612903 0.79738562 0.94557823]

mean value: 0.9164130675378523

MCC on Blind test: 0.18

Accuracy on Blind test: 0.49

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01099372 0.01085377 0.01167774 0.011415   0.01157284 0.01085854
 0.01104522 0.01073241 0.01113367 0.01111031]

mean value: 0.011139321327209472

key: score_time
value: [0.0103929  0.0103898  0.01037788 0.0103898  0.01040602 0.01039386
 0.01040792 0.01042295 0.01051378 0.01045513]

mean value: 0.010415005683898925

key: test_mcc
value: [0.44539933 0.32025631 0.81409158 0.57735027 0.77459667 0.75592895
 0.87866878 0.9375     0.87083333 0.76594169]

mean value: 0.714056690328539

key: train_mcc
value: [0.87107074 0.62077843 0.83774371 0.57207859 0.80452795 0.84114227
 0.89199759 0.83981496 0.95090121 0.86664533]

mean value: 0.8096700777785382

key: test_accuracy
value: [0.71875    0.625      0.90625    0.75       0.875      0.875
 0.93548387 0.96774194 0.93548387 0.87096774]

mean value: 0.8459677419354839

key: train_accuracy
value: [0.93309859 0.77816901 0.91549296 0.75       0.8943662  0.91549296
 0.94385965 0.91578947 0.9754386  0.92982456]

mean value: 0.8951531999011614

key: test_fscore
value: [0.68965517 0.45454545 0.90909091 0.66666667 0.88888889 0.88235294
 0.9375     0.96774194 0.9375     0.88888889]

mean value: 0.8222830857154942

key: train_fscore
value: [0.92936803 0.71493213 0.9205298  0.66976744 0.90384615 0.92156863
 0.94666667 0.92156863 0.9754386  0.93377483]

mean value: 0.8837460905964674

key: test_precision
value: [0.76923077 0.83333333 0.88235294 1.         0.8        0.83333333
 0.88235294 0.9375     0.9375     0.8       ]

mean value: 0.8675603318250378

key: train_precision
value: [0.98425197 1.         0.86875    0.98630137 0.82941176 0.8597561
 0.9044586  0.86503067 0.97202797 0.88125   ]

mean value: 0.9151238446234521

key: test_recall
value: [0.625  0.3125 0.9375 0.5    1.     0.9375 1.     1.     0.9375 1.    ]

mean value: 0.825

key: train_recall
value: [0.88028169 0.55633803 0.97887324 0.50704225 0.99295775 0.99295775
 0.99300699 0.98601399 0.97887324 0.99295775]

mean value: 0.8859302669161824

key: test_roc_auc
value: [0.71875    0.625      0.90625    0.75       0.875      0.875
 0.9375     0.96875    0.93541667 0.86666667]

mean value: 0.8458333333333333

key: train_roc_auc
value: [0.93309859 0.77816901 0.91549296 0.75       0.8943662  0.91549296
 0.9436866  0.9155422  0.97545061 0.93004531]

mean value: 0.8951344430217669

key: test_jcc
value: [0.52631579 0.29411765 0.83333333 0.5        0.8        0.78947368
 0.88235294 0.9375     0.88235294 0.8       ]

mean value: 0.7245446336429309

key: train_jcc
value: [0.86805556 0.55633803 0.85276074 0.5034965  0.8245614  0.85454545
 0.89873418 0.85454545 0.95205479 0.8757764 ]

mean value: 0.8040868505268339

MCC on Blind test: 0.08

Accuracy on Blind test: 0.19

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.09240365 0.08133793 0.08125806 0.08046818 0.08067393 0.08131313
 0.08116984 0.0813272  0.08116603 0.08120728]

mean value: 0.08223252296447754

key: score_time
value: [0.01535177 0.0154326  0.01515222 0.01522565 0.01519728 0.0153811
 0.01536131 0.01532435 0.01529288 0.01531577]

mean value: 0.015303492546081543

key: test_mcc
value: [0.81409158 0.875      0.93933644 0.81409158 0.93933644 1.
 0.9375     1.         1.         0.87770745]

mean value: 0.9197063481549348

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.90625    0.9375     0.96875    0.90625    0.96875    1.
 0.96774194 1.         1.         0.93548387]

mean value: 0.9590725806451613

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.9375     0.96969697 0.90322581 0.96969697 1.
 0.96774194 1.         1.         0.94117647]

mean value: 0.9598129061008568

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88235294 0.9375     0.94117647 0.93333333 0.94117647 1.
 0.9375     1.         1.         0.88888889]

mean value: 0.9461928104575164

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9375 0.9375 1.     0.875  1.     1.     1.     1.     1.     1.    ]

mean value: 0.975

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90625    0.9375     0.96875    0.90625    0.96875    1.
 0.96875    1.         1.         0.93333333]

mean value: 0.9589583333333334

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.88235294 0.94117647 0.82352941 0.94117647 1.
 0.9375     1.         1.         0.88888889]

mean value: 0.9247957516339869

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.19

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03232431 0.02845907 0.02876759 0.02964163 0.04149294 0.0316689
 0.04340553 0.03582311 0.04470134 0.04033637]

mean value: 0.035662078857421876

key: score_time
value: [0.01759839 0.02218199 0.01889658 0.01943088 0.02917433 0.03231716
 0.03496408 0.03411865 0.02071142 0.01735568]

mean value: 0.02467491626739502

key: test_mcc
value: [0.81409158 0.81409158 0.875      0.93933644 1.         1.
 0.87866878 1.         0.87866878 0.9372467 ]

mean value: 0.913710384964254

key: train_mcc
value: [0.99298237 1.         0.99298237 1.         0.99298237 0.98591549
 1.         0.98596474 0.99300665 0.98596474]

mean value: 0.9929798730055359

key: test_accuracy
value: [0.90625    0.90625    0.9375     0.96875    1.         1.
 0.93548387 1.         0.93548387 0.96774194]

mean value: 0.9557459677419354

key: train_accuracy
value: [0.99647887 1.         0.99647887 1.         0.99647887 0.99295775
 1.         0.99298246 0.99649123 0.99298246]

mean value: 0.996485050654806

key: test_fscore
value: [0.90909091 0.90322581 0.9375     0.96774194 1.         1.
 0.9375     1.         0.93333333 0.96969697]

mean value: 0.9558088954056696

key: train_fscore
value: [0.99646643 1.         0.99646643 1.         0.99646643 0.99295775
 1.         0.99300699 0.99646643 0.99295775]

mean value: 0.9964788210346365

key: test_precision
value: [0.88235294 0.93333333 0.9375     1.         1.         1.
 0.88235294 1.         1.         0.94117647]

mean value: 0.957671568627451

key: train_precision
value: [1.         1.         1.         1.         1.         0.99295775
 1.         0.99300699 1.         0.99295775]

mean value: 0.997892248596474

key: test_recall
value: [0.9375 0.875  0.9375 0.9375 1.     1.     1.     1.     0.875  1.    ]

mean value: 0.95625

key: train_recall
value: [0.99295775 1.         0.99295775 1.         0.99295775 0.99295775
 1.         0.99300699 0.99295775 0.99295775]

mean value: 0.9950753471880233

key: test_roc_auc
value: [0.90625    0.90625    0.9375     0.96875    1.         1.
 0.9375     1.         0.9375     0.96666667]

mean value: 0.9560416666666667

key: train_roc_auc
value: [0.99647887 1.         0.99647887 1.         0.99647887 0.99295775
 1.         0.99298237 0.99647887 0.99298237]

mean value: 0.9964837978922486

key: test_jcc
value: [0.83333333 0.82352941 0.88235294 0.9375     1.         1.
 0.88235294 1.         0.875      0.94117647]

mean value: 0.9175245098039215

key: train_jcc
value: [0.99295775 1.         0.99295775 1.         0.99295775 0.98601399
 1.         0.98611111 0.99295775 0.98601399]

mean value: 0.9929970069054577

MCC on Blind test: 0.06

Accuracy on Blind test: 0.2

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.05620861 0.0974803  0.04944372 0.04494715 0.06924796 0.05304265
 0.03282356 0.03296423 0.03626871 0.06697369]

mean value: 0.0539400577545166

key: score_time
value: [0.02177811 0.02000475 0.01141953 0.01396704 0.02782083 0.01147771
 0.011482   0.01143765 0.01138997 0.02080035]

mean value: 0.01615779399871826

key: test_mcc
value: [0.62994079 0.438357   0.56360186 0.68884672 0.75       0.68884672
 0.80833333 0.74166667 0.68826048 0.76594169]

mean value: 0.6763795258534475

key: train_mcc
value: [0.8612933  0.86052165 0.83971646 0.85382934 0.85314992 0.86794223
 0.84766497 0.84023701 0.85436741 0.84697783]

mean value: 0.8525700111060143

key: test_accuracy
value: [0.8125     0.71875    0.78125    0.84375    0.875      0.84375
 0.90322581 0.87096774 0.83870968 0.87096774]

mean value: 0.8358870967741936

key: train_accuracy
value: [0.92957746 0.92957746 0.91901408 0.92605634 0.92605634 0.93309859
 0.92280702 0.91929825 0.92631579 0.92280702]

mean value: 0.925460835186558

key: test_fscore
value: [0.8        0.70967742 0.78787879 0.84848485 0.875      0.84848485
 0.90322581 0.86666667 0.85714286 0.88888889]

mean value: 0.838545012335335

key: train_fscore
value: [0.93197279 0.93150685 0.92150171 0.92832765 0.92783505 0.93515358
 0.92567568 0.9220339  0.92832765 0.92465753]

mean value: 0.9276992378409221

key: test_precision
value: [0.85714286 0.73333333 0.76470588 0.82352941 0.875      0.82352941
 0.875      0.86666667 0.78947368 0.8       ]

mean value: 0.8208381247235736

key: train_precision
value: [0.90131579 0.90666667 0.89403974 0.90066225 0.90604027 0.90728477
 0.89542484 0.89473684 0.90066225 0.9       ]

mean value: 0.9006833409925814

key: test_recall
value: [0.75       0.6875     0.8125     0.875      0.875      0.875
 0.93333333 0.86666667 0.9375     1.        ]

mean value: 0.86125

key: train_recall
value: [0.96478873 0.95774648 0.95070423 0.95774648 0.95070423 0.96478873
 0.95804196 0.95104895 0.95774648 0.95070423]

mean value: 0.9564020486555698

key: test_roc_auc
value: [0.8125     0.71875    0.78125    0.84375    0.875      0.84375
 0.90416667 0.87083333 0.83541667 0.86666667]

mean value: 0.8352083333333333

key: train_roc_auc
value: [0.92957746 0.92957746 0.91901408 0.92605634 0.92605634 0.93309859
 0.92268295 0.91918645 0.92642569 0.92290456]

mean value: 0.9254579927115139

key: test_jcc
value: [0.66666667 0.55       0.65       0.73684211 0.77777778 0.73684211
 0.82352941 0.76470588 0.75       0.8       ]

mean value: 0.7256363949088407

key: train_jcc
value: [0.87261146 0.87179487 0.85443038 0.86624204 0.86538462 0.87820513
 0.86163522 0.85534591 0.86624204 0.85987261]

mean value: 0.8651764280073164

MCC on Blind test: 0.18

Accuracy on Blind test: 0.54

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.16305089 0.16052961 0.15664053 0.15771556 0.15495181 0.15255976
 0.15471911 0.15581322 0.15795827 0.15890527]

mean value: 0.15728440284729003

key: score_time
value: [0.00907922 0.00902605 0.00912547 0.0093677  0.00861001 0.00851989
 0.00923562 0.00842047 0.00907159 0.00920391]

mean value: 0.00896599292755127

key: test_mcc
value: [0.81409158 0.875      0.875      1.         1.         1.
 0.9375     1.         1.         0.9372467 ]

mean value: 0.9438838276217104

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.90625    0.9375     0.9375     1.         1.         1.
 0.96774194 1.         1.         0.96774194]

mean value: 0.9716733870967742

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.90909091 0.9375     0.9375     1.         1.         1.
 0.96774194 1.         1.         0.96969697]

mean value: 0.972152981427175

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.88235294 0.9375     0.9375     1.         1.         1.
 0.9375     1.         1.         0.94117647]

mean value: 0.9636029411764706

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.9375 0.9375 0.9375 1.     1.     1.     1.     1.     1.     1.    ]

mean value: 0.98125

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.90625    0.9375     0.9375     1.         1.         1.
 0.96875    1.         1.         0.96666667]

mean value: 0.9716666666666667

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.83333333 0.88235294 0.88235294 1.         1.         1.
 0.9375     1.         1.         0.94117647]

mean value: 0.9476715686274509

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.19

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01113367 0.01251006 0.01261091 0.01772285 0.01263809 0.01343751
 0.0127852  0.01266503 0.01297426 0.01285744]

mean value: 0.013133502006530762

key: score_time
value: [0.01069093 0.01078391 0.0107305  0.01083326 0.01099324 0.01084566
 0.01136661 0.01079345 0.01084757 0.01162291]

mean value: 0.010950803756713867

key: test_mcc
value: [0.68884672 0.59215653 0.81409158 0.56360186 0.77459667 0.75
 0.74896053 0.54812195 0.53006813 0.82078268]

mean value: 0.6831226650318738

key: train_mcc
value: [0.8145351  0.86223926 0.87332606 0.85924016 0.86725157 0.87541287
 0.7742616  0.84773912 0.81144956 0.88848951]

mean value: 0.8473944811490282

key: test_accuracy
value: [0.84375    0.78125    0.90625    0.78125    0.875      0.875
 0.87096774 0.77419355 0.74193548 0.90322581]

mean value: 0.8352822580645162

key: train_accuracy
value: [0.90140845 0.92957746 0.93661972 0.92957746 0.93309859 0.93661972
 0.88070175 0.92280702 0.90175439 0.94385965]

mean value: 0.9216024215468248

key: test_fscore
value: [0.83870968 0.81081081 0.90322581 0.78787879 0.88888889 0.875
 0.875      0.75862069 0.69230769 0.91428571]

mean value: 0.8344728067698034

key: train_fscore
value: [0.89230769 0.92647059 0.93706294 0.92907801 0.9347079  0.93430657
 0.89102564 0.92028986 0.89393939 0.94244604]

mean value: 0.9201634638116422

key: test_precision
value: [0.86666667 0.71428571 0.93333333 0.76470588 0.8        0.875
 0.82352941 0.78571429 0.9        0.84210526]

mean value: 0.8305340557275542

key: train_precision
value: [0.98305085 0.96923077 0.93055556 0.93571429 0.91275168 0.96969697
 0.82248521 0.95488722 0.96721311 0.96323529]

mean value: 0.9408820939525007

key: test_recall
value: [0.8125     0.9375     0.875      0.8125     1.         0.875
 0.93333333 0.73333333 0.5625     1.        ]

mean value: 0.8541666666666666

key: train_recall
value: [0.81690141 0.88732394 0.94366197 0.92253521 0.95774648 0.90140845
 0.97202797 0.88811189 0.83098592 0.92253521]

mean value: 0.9043238451689156

key: test_roc_auc
value: [0.84375    0.78125    0.90625    0.78125    0.875      0.875
 0.87291667 0.77291667 0.74791667 0.9       ]

mean value: 0.8356250000000001

key: train_roc_auc
value: [0.90140845 0.92957746 0.93661972 0.92957746 0.93309859 0.93661972
 0.88038018 0.92292918 0.90150694 0.94378509]

mean value: 0.9215502807052103

key: test_jcc
value: [0.72222222 0.68181818 0.82352941 0.65       0.8        0.77777778
 0.77777778 0.61111111 0.52941176 0.84210526]

mean value: 0.7215753510335554

key: train_jcc
value: [0.80555556 0.8630137  0.88157895 0.86754967 0.87741935 0.87671233
 0.80346821 0.85234899 0.80821918 0.89115646]

mean value: 0.8527022396082421

MCC on Blind test: 0.2

Accuracy on Blind test: 0.58

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01892829 0.02864075 0.02735972 0.02840042 0.0284605  0.02859259
 0.02847409 0.02875304 0.02716875 0.01797819]

mean value: 0.026275634765625

key: score_time
value: [0.01286125 0.01076007 0.01073647 0.01065683 0.01062083 0.01068902
 0.01063824 0.03288555 0.0109849  0.02092481]

mean value: 0.014175796508789062

key: test_mcc
value: [0.68884672 0.75       0.81409158 0.93933644 0.8819171  1.
 0.87083333 0.9372467  0.87083333 0.9372467 ]

mean value: 0.8690351901199767

key: train_mcc
value: [0.92994649 0.90955652 0.90901439 0.89492115 0.91585639 0.90955652
 0.93741093 0.90253931 0.90988464 0.90897898]

mean value: 0.9127665317222325

key: test_accuracy
value: [0.84375    0.875      0.90625    0.96875    0.9375     1.
 0.93548387 0.96774194 0.93548387 0.96774194]

mean value: 0.9337701612903225

key: train_accuracy
value: [0.96478873 0.95422535 0.95422535 0.9471831  0.95774648 0.95422535
 0.96842105 0.95087719 0.95438596 0.95438596]

mean value: 0.9560464541635779

key: test_fscore
value: [0.84848485 0.875      0.90322581 0.96969697 0.94117647 1.
 0.93333333 0.96551724 0.9375     0.96969697]

mean value: 0.934363163963128

key: train_fscore
value: [0.96527778 0.95532646 0.9550173  0.94809689 0.95833333 0.95532646
 0.96907216 0.95205479 0.95532646 0.95470383]

mean value: 0.9568535471627235

key: test_precision
value: [0.82352941 0.875      0.93333333 0.94117647 0.88888889 1.
 0.93333333 1.         0.9375     0.94117647]

mean value: 0.9273937908496732

key: train_precision
value: [0.95205479 0.93288591 0.93877551 0.93197279 0.94520548 0.93288591
 0.9527027  0.93288591 0.93288591 0.94482759]

mean value: 0.9397082486363004

key: test_recall
value: [0.875      0.875      0.875      1.         1.         1.
 0.93333333 0.93333333 0.9375     1.        ]

mean value: 0.9429166666666666

key: train_recall
value: [0.97887324 0.97887324 0.97183099 0.96478873 0.97183099 0.97887324
 0.98601399 0.97202797 0.97887324 0.96478873]

mean value: 0.9746774352408155

key: test_roc_auc
value: [0.84375    0.875      0.90625    0.96875    0.9375     1.
 0.93541667 0.96666667 0.93541667 0.96666667]

mean value: 0.9335416666666667

key: train_roc_auc
value: [0.96478873 0.95422535 0.95422535 0.9471831  0.95774648 0.95422535
 0.96835911 0.95080272 0.95447158 0.95442234]

mean value: 0.9560450113267015

key: test_jcc
value: [0.73684211 0.77777778 0.82352941 0.94117647 0.88888889 1.
 0.875      0.93333333 0.88235294 0.94117647]

mean value: 0.8800077399380805

key: train_jcc
value: [0.93288591 0.91447368 0.91390728 0.90131579 0.92       0.91447368
 0.94       0.90849673 0.91447368 0.91333333]

mean value: 0.9173360098273221

MCC on Blind test: 0.22

Accuracy on Blind test: 0.49

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.16565156 0.08813143 0.17730904 0.20824528 0.18379951 0.1740315
 0.17967129 0.17941952 0.18022633 0.19038606]

mean value: 0.17268714904785157

key: score_time
value: [0.0107646  0.01237965 0.01942182 0.01081586 0.01998901 0.01066494
 0.01124215 0.01264286 0.01966715 0.02074218]

mean value: 0.014833021163940429

key: test_mcc
value: [0.81409158 0.75       0.81409158 1.         0.8819171  1.
 0.87083333 1.         1.         0.87770745]

mean value: 0.900864104531543

key: train_mcc
value: [0.95129413 0.94450549 0.94403659 0.93720088 0.94403659 0.93720088
 0.93741093 0.93741093 0.93130575 0.95146839]

mean value: 0.9415870567033926

key: test_accuracy
value: [0.90625    0.875      0.90625    1.         0.9375     1.
 0.93548387 1.         1.         0.93548387]

mean value: 0.9495967741935484

key: train_accuracy
value: [0.97535211 0.97183099 0.97183099 0.96830986 0.97183099 0.96830986
 0.96842105 0.96842105 0.96491228 0.9754386 ]

mean value: 0.9704657771188535

key: test_fscore
value: [0.90909091 0.875      0.90322581 1.         0.94117647 1.
 0.93333333 1.         1.         0.94117647]

mean value: 0.9503002990052326

key: train_fscore
value: [0.97577855 0.97241379 0.97222222 0.96885813 0.97222222 0.96885813
 0.96907216 0.96907216 0.96575342 0.97577855]

mean value: 0.9710029348503718

key: test_precision
value: [0.88235294 0.875      0.93333333 1.         0.88888889 1.
 0.93333333 1.         1.         0.88888889]

mean value: 0.9401797385620915

key: train_precision
value: [0.95918367 0.9527027  0.95890411 0.95238095 0.95890411 0.95238095
 0.9527027  0.9527027  0.94       0.95918367]

mean value: 0.953904557898687

key: test_recall
value: [0.9375     0.875      0.875      1.         1.         1.
 0.93333333 1.         1.         1.        ]

mean value: 0.9620833333333333

key: train_recall
value: [0.99295775 0.99295775 0.98591549 0.98591549 0.98591549 0.98591549
 0.98601399 0.98601399 0.99295775 0.99295775]

mean value: 0.9887520929774452

key: test_roc_auc
value: [0.90625    0.875      0.90625    1.         0.9375     1.
 0.93541667 1.         1.         0.93333333]

mean value: 0.949375

key: train_roc_auc
value: [0.97535211 0.97183099 0.97183099 0.96830986 0.97183099 0.96830986
 0.96835911 0.96835911 0.96501034 0.97549985]

mean value: 0.9704693194129814

key: test_jcc
value: [0.83333333 0.77777778 0.82352941 1.         0.88888889 1.
 0.875      1.         1.         0.88888889]

mean value: 0.9087418300653595

key: train_jcc
value: [0.9527027  0.94630872 0.94594595 0.93959732 0.94594595 0.93959732
 0.94       0.94       0.93377483 0.9527027 ]

mean value: 0.9436575487439082

MCC on Blind test: 0.2

Accuracy on Blind test: 0.43

Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegression(random_state=42))])

key: fit_time
value: [0.02582741 0.02434468 0.02545023 0.02682304 0.02640653 0.02720022
 0.02612591 0.02765298 0.02734327 0.04823351]

mean value: 0.028540778160095214

key: score_time
value: [0.01104569 0.01089406 0.01136374 0.01076293 0.01096463 0.01084781
 0.01096702 0.01096272 0.01116037 0.01098251]

mean value: 0.010995149612426758

key: test_mcc
value: [0.81325006 0.87831007 0.80813523 0.78446454 0.77459667 0.83914639
 0.80813523 0.90748521 0.73763441 0.77382584]

mean value: 0.8124983647487063

key: train_mcc
value: [0.83119879 0.83472681 0.83507281 0.87790234 0.85985131 0.84227171
 0.84207536 0.83472681 0.85645761 0.83886705]

mean value: 0.8453150611021845

key: test_accuracy
value: [0.90322581 0.93548387 0.90322581 0.88709677 0.88709677 0.91935484
 0.90322581 0.9516129  0.86885246 0.8852459 ]

mean value: 0.9044420941300899

key: train_accuracy
value: [0.91546763 0.91726619 0.91726619 0.93884892 0.92985612 0.92086331
 0.92086331 0.91726619 0.92818671 0.91921005]

mean value: 0.9225094610128773

key: test_fscore
value: [0.90909091 0.93939394 0.90625    0.87719298 0.88888889 0.92063492
 0.90625    0.95384615 0.86666667 0.89230769]

mean value: 0.9060522153285311

key: train_fscore
value: [0.91651865 0.91814947 0.91872792 0.93950178 0.93048128 0.92226148
 0.92198582 0.91814947 0.92882562 0.92035398]

mean value: 0.9234955465227851

key: test_precision
value: [0.85714286 0.88571429 0.87878788 0.96153846 0.875      0.90625
 0.87878788 0.91176471 0.86666667 0.85294118]

mean value: 0.887459391099097

key: train_precision
value: [0.90526316 0.9084507  0.90277778 0.92957746 0.92226148 0.90625
 0.90909091 0.9084507  0.92226148 0.90592334]

mean value: 0.9120307031148476

key: test_recall
value: [0.96774194 1.         0.93548387 0.80645161 0.90322581 0.93548387
 0.93548387 1.         0.86666667 0.93548387]

mean value: 0.9286021505376344

key: train_recall
value: [0.92805755 0.92805755 0.9352518  0.94964029 0.93884892 0.93884892
 0.9352518  0.92805755 0.93548387 0.9352518 ]

mean value: 0.9352750058018101

key: test_roc_auc
value: [0.90322581 0.93548387 0.90322581 0.88709677 0.88709677 0.91935484
 0.90322581 0.9516129  0.8688172  0.8844086 ]

mean value: 0.9043548387096774

key: train_roc_auc
value: [0.91546763 0.91726619 0.91726619 0.93884892 0.92985612 0.92086331
 0.92086331 0.91726619 0.92817359 0.9192388 ]

mean value: 0.922511023439313

key: test_jcc
value: [0.83333333 0.88571429 0.82857143 0.78125    0.8        0.85294118
 0.82857143 0.91176471 0.76470588 0.80555556]

mean value: 0.8292407796451914

key: train_jcc
value: [0.84590164 0.84868421 0.8496732  0.88590604 0.87       0.8557377
 0.85526316 0.84868421 0.86710963 0.85245902]

mean value: 0.8579418817037436

MCC on Blind test: 0.21

Accuracy on Blind test: 0.53

Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LogisticRegressionCV(random_state=42))])

key: fit_time
value: [0.77656746 0.70605206 0.85919523 0.69673634 0.68120766 0.78336358
 0.78208661 0.70059681 0.83346748 0.76766825]

mean value: 0.7586941480636596

key: score_time
value: [0.01191044 0.01261759 0.01475716 0.01256537 0.0127914  0.01143336
 0.01280951 0.01418829 0.01239324 0.01240849]

mean value: 0.012787485122680664

key: test_mcc
value: [0.90369611 0.93743687 0.90369611 0.82199494 0.84266484 0.93743687
 0.90369611 0.87278605 0.87055472 0.96770777]

mean value: 0.8961670394093372

key: train_mcc
value: [0.97124816 0.96043787 0.96402878 0.94966486 0.9497386  0.96402878
 0.94986154 0.96405373 0.97487139 0.96768995]

mean value: 0.9615623654982854

key: test_accuracy
value: [0.9516129  0.96774194 0.9516129  0.90322581 0.91935484 0.96774194
 0.9516129  0.93548387 0.93442623 0.98360656]

mean value: 0.9466419883659439

key: train_accuracy
value: [0.98561151 0.98021583 0.98201439 0.97482014 0.97482014 0.98201439
 0.97482014 0.98201439 0.98743268 0.98384201]

mean value: 0.9807605621068675

key: test_fscore
value: [0.95081967 0.96666667 0.95081967 0.89285714 0.92307692 0.96875
 0.95238095 0.93333333 0.93103448 0.98412698]

mean value: 0.9453865829462917

key: train_fscore
value: [0.98566308 0.98018018 0.98201439 0.97491039 0.975      0.98201439
 0.97508897 0.98207885 0.98747764 0.98378378]

mean value: 0.9808211677303444

key: test_precision
value: [0.96666667 1.         0.96666667 1.         0.88235294 0.93939394
 0.9375     0.96551724 0.96428571 0.96875   ]

mean value: 0.9591133169568768

key: train_precision
value: [0.98214286 0.98194946 0.98201439 0.97142857 0.96808511 0.98201439
 0.96478873 0.97857143 0.98571429 0.98555957]

mean value: 0.9782268783883663

key: test_recall
value: [0.93548387 0.93548387 0.93548387 0.80645161 0.96774194 1.
 0.96774194 0.90322581 0.9        1.        ]

mean value: 0.9351612903225807

key: train_recall
value: [0.98920863 0.97841727 0.98201439 0.97841727 0.98201439 0.98201439
 0.98561151 0.98561151 0.98924731 0.98201439]

mean value: 0.9834571052835152

key: test_roc_auc
value: [0.9516129  0.96774194 0.9516129  0.90322581 0.91935484 0.96774194
 0.9516129  0.93548387 0.93387097 0.98333333]

mean value: 0.9465591397849463

key: train_roc_auc
value: [0.98561151 0.98021583 0.98201439 0.97482014 0.97482014 0.98201439
 0.97482014 0.98201439 0.98742941 0.98383874]

mean value: 0.9807599082024703

key: test_jcc
value: [0.90625    0.93548387 0.90625    0.80645161 0.85714286 0.93939394
 0.90909091 0.875      0.87096774 0.96875   ]

mean value: 0.8974780931434158

key: train_jcc
value: [0.97173145 0.96113074 0.96466431 0.95104895 0.95121951 0.96466431
 0.95138889 0.96478873 0.97526502 0.96808511]

mean value: 0.9623987021299

MCC on Blind test: 0.14

Accuracy on Blind test: 0.35

Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianNB())])

key: fit_time
value: [0.01096439 0.01291323 0.00820684 0.00848889 0.00764108 0.0079782
 0.00760031 0.00791764 0.00761533 0.00789833]

mean value: 0.008722424507141113

key: score_time
value: [0.01091075 0.00868392 0.00824642 0.00876021 0.00801086 0.00799608
 0.00807261 0.00803852 0.00840473 0.00831747]

mean value: 0.008544158935546876

key: test_mcc
value: [0.67883359 0.64549722 0.7130241  0.52981294 0.74193548 0.7130241
 0.80813523 0.81325006 0.50860215 0.77072165]

mean value: 0.6922836529141403

key: train_mcc
value: [0.71239616 0.71972253 0.72313855 0.6419512  0.73033396 0.70874774
 0.69849277 0.6908084  0.72712387 0.72023891]

mean value: 0.7072954079422489

key: test_accuracy
value: [0.83870968 0.82258065 0.85483871 0.75806452 0.87096774 0.85483871
 0.90322581 0.90322581 0.75409836 0.8852459 ]

mean value: 0.8445795875198308

key: train_accuracy
value: [0.85611511 0.85971223 0.86151079 0.82014388 0.86510791 0.85431655
 0.84892086 0.84532374 0.86355476 0.85996409]

mean value: 0.8534669930124124

key: test_fscore
value: [0.84375    0.82539683 0.86153846 0.72727273 0.87096774 0.86153846
 0.90625    0.90909091 0.75409836 0.88888889]

mean value: 0.8448792376317495

key: train_fscore
value: [0.85765125 0.86170213 0.8627451  0.81343284 0.86631016 0.85561497
 0.85211268 0.84697509 0.86428571 0.86170213]

mean value: 0.8542532047730724

key: test_precision
value: [0.81818182 0.8125     0.82352941 0.83333333 0.87096774 0.82352941
 0.87878788 0.85714286 0.74193548 0.875     ]

mean value: 0.833490793678175

key: train_precision
value: [0.84859155 0.84965035 0.85512367 0.84496124 0.85865724 0.84805654
 0.83448276 0.83802817 0.86120996 0.84965035]

mean value: 0.8488411836784526

key: test_recall
value: [0.87096774 0.83870968 0.90322581 0.64516129 0.87096774 0.90322581
 0.93548387 0.96774194 0.76666667 0.90322581]

mean value: 0.8605376344086022

key: train_recall
value: [0.86690647 0.87410072 0.8705036  0.78417266 0.87410072 0.86330935
 0.8705036  0.85611511 0.86738351 0.87410072]

mean value: 0.8601196462185091

key: test_roc_auc
value: [0.83870968 0.82258065 0.85483871 0.75806452 0.87096774 0.85483871
 0.90322581 0.90322581 0.75430108 0.88494624]

mean value: 0.8445698924731183

key: train_roc_auc
value: [0.85611511 0.85971223 0.86151079 0.82014388 0.86510791 0.85431655
 0.84892086 0.84532374 0.86354787 0.85998943]

mean value: 0.8534688378329595

key: test_jcc
value: [0.72972973 0.7027027  0.75675676 0.57142857 0.77142857 0.75675676
 0.82857143 0.83333333 0.60526316 0.8       ]

mean value: 0.7355971008602588

key: train_jcc
value: [0.75077882 0.75700935 0.75862069 0.68553459 0.76415094 0.74766355
 0.74233129 0.7345679  0.76100629 0.75700935]

mean value: 0.7458672762322701

MCC on Blind test: 0.21

Accuracy on Blind test: 0.57

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.00830388 0.00819135 0.00802517 0.00794339 0.00828314 0.00875449
 0.00812817 0.0083952  0.0085578  0.00874829]

mean value: 0.008333086967468262

key: score_time
value: [0.00846505 0.00840521 0.008214   0.00820637 0.00816274 0.00835466
 0.00821352 0.00871086 0.0093646  0.00880384]

mean value: 0.00849008560180664

key: test_mcc
value: [0.51639778 0.56761348 0.61290323 0.65372045 0.74348441 0.5809475
 0.58834841 0.7130241  0.58264312 0.54086022]

mean value: 0.6099942679846233

key: train_mcc
value: [0.62249953 0.6079176  0.63414469 0.60794907 0.59713776 0.61543051
 0.64482423 0.62249953 0.6375268  0.6122178 ]

mean value: 0.620214750789007

key: test_accuracy
value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258
 0.79032258 0.85483871 0.78688525 0.7704918 ]

mean value: 0.8025118984664199

key: train_accuracy
value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396
 0.82194245 0.81115108 0.81867145 0.80610413]

mean value: 0.8099595727367837

key: test_fscore
value: [0.75409836 0.74074074 0.80645161 0.80701754 0.875      0.79365079
 0.80597015 0.86153846 0.8        0.77419355]

mean value: 0.8018661210989436

key: train_fscore
value: [0.80874317 0.8036036  0.82167832 0.80500894 0.79928315 0.80438757
 0.82661996 0.81349911 0.82123894 0.80505415]

mean value: 0.8109116928454192

key: test_precision
value: [0.76666667 0.86956522 0.80645161 0.88461538 0.84848485 0.78125
 0.75       0.82352941 0.74285714 0.77419355]

mean value: 0.8047613833070375

key: train_precision
value: [0.81918819 0.80505415 0.79931973 0.80071174 0.79642857 0.81784387
 0.80546075 0.80350877 0.81118881 0.80797101]

mean value: 0.8066675601234072

key: test_recall
value: [0.74193548 0.64516129 0.80645161 0.74193548 0.90322581 0.80645161
 0.87096774 0.90322581 0.86666667 0.77419355]

mean value: 0.8060215053763441

key: train_recall
value: [0.79856115 0.80215827 0.84532374 0.80935252 0.80215827 0.79136691
 0.84892086 0.82374101 0.83154122 0.80215827]

mean value: 0.8155282225832238

key: test_roc_auc
value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258
 0.79032258 0.85483871 0.78817204 0.77043011]

mean value: 0.8026344086021505

key: train_roc_auc
value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396
 0.82194245 0.81115108 0.81864831 0.80609706]

mean value: 0.8099565508883215

key: test_jcc
value: [0.60526316 0.58823529 0.67567568 0.67647059 0.77777778 0.65789474
 0.675      0.75675676 0.66666667 0.63157895]

mean value: 0.6711319601335082

key: train_jcc
value: [0.67889908 0.67168675 0.69732938 0.67365269 0.66567164 0.67278287
 0.70447761 0.68562874 0.6966967  0.67371601]

mean value: 0.6820541480667476

MCC on Blind test: 0.18

Accuracy on Blind test: 0.52

Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', KNeighborsClassifier())])

key: fit_time
value: [0.00809216 0.00795221 0.00791621 0.00790453 0.00721383 0.00737166
 0.00789046 0.00771284 0.00766468 0.00788069]

mean value: 0.0077599287033081055

key: score_time
value: [0.01314855 0.01184964 0.0114398  0.0146842  0.01099205 0.01099849
 0.01176476 0.0116837  0.01157475 0.01158309]

mean value: 0.01197190284729004

key: test_mcc
value: [0.45760432 0.48488114 0.67883359 0.55301004 0.67883359 0.67883359
 0.54953196 0.74348441 0.40967742 0.70780713]

mean value: 0.5942497191157756

key: train_mcc
value: [0.7125253  0.73779681 0.71605437 0.74499483 0.7125253  0.71313508
 0.726788   0.73745301 0.75237261 0.72554668]

mean value: 0.7279191995608599

key: test_accuracy
value: [0.72580645 0.74193548 0.83870968 0.77419355 0.83870968 0.83870968
 0.77419355 0.87096774 0.70491803 0.85245902]

mean value: 0.7960602855631941

key: train_accuracy
value: [0.85611511 0.86870504 0.85791367 0.87230216 0.85611511 0.85611511
 0.86330935 0.86870504 0.87612208 0.86175943]

mean value: 0.8637162083618563

key: test_fscore
value: [0.70175439 0.75       0.83333333 0.75862069 0.83333333 0.84375
 0.78125    0.86666667 0.7        0.86153846]

mean value: 0.7930246870491879

key: train_fscore
value: [0.8540146  0.86654479 0.856102   0.8702011  0.8540146  0.85239852
 0.86181818 0.86799277 0.87522604 0.85607477]

mean value: 0.8614387366046266

key: test_precision
value: [0.76923077 0.72727273 0.86206897 0.81481481 0.86206897 0.81818182
 0.75757576 0.89655172 0.7        0.82352941]

mean value: 0.8031294954013006

key: train_precision
value: [0.86666667 0.88104089 0.86715867 0.88475836 0.86666667 0.875
 0.87132353 0.87272727 0.88321168 0.89105058]

mean value: 0.8759604326054368

key: test_recall
value: [0.64516129 0.77419355 0.80645161 0.70967742 0.80645161 0.87096774
 0.80645161 0.83870968 0.7        0.90322581]

mean value: 0.7861290322580645

key: train_recall
value: [0.84172662 0.85251799 0.84532374 0.85611511 0.84172662 0.83093525
 0.85251799 0.86330935 0.86738351 0.82374101]

mean value: 0.8475297181609551

key: test_roc_auc
value: [0.72580645 0.74193548 0.83870968 0.77419355 0.83870968 0.83870968
 0.77419355 0.87096774 0.70483871 0.8516129 ]

mean value: 0.7959677419354838

key: train_roc_auc
value: [0.85611511 0.86870504 0.85791367 0.87230216 0.85611511 0.85611511
 0.86330935 0.86870504 0.8761378  0.86169129]

mean value: 0.8637109667105026

key: test_jcc
value: [0.54054054 0.6        0.71428571 0.61111111 0.71428571 0.72972973
 0.64102564 0.76470588 0.53846154 0.75675676]

mean value: 0.6610902628549687

key: train_jcc
value: [0.74522293 0.76451613 0.74840764 0.77022654 0.74522293 0.74276527
 0.7571885  0.76677316 0.77813505 0.74836601]

mean value: 0.7566824165390956

MCC on Blind test: 0.16

Accuracy on Blind test: 0.57

Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SVC(random_state=42))])

key: fit_time
value: [0.01524782 0.01526904 0.01670051 0.01508474 0.01485252 0.01501393
 0.01506996 0.01522017 0.01478338 0.01487541]

mean value: 0.015211749076843261

key: score_time
value: [0.00945497 0.00925422 0.00928378 0.00928211 0.00977159 0.00912595
 0.00927424 0.00921845 0.00913382 0.00917101]

mean value: 0.009297013282775879

key: test_mcc
value: [0.64820372 0.75623534 0.80813523 0.71004695 0.74819006 0.7284928
 0.7190925  0.70116959 0.61256703 0.6844511 ]

mean value: 0.7116584311085777

key: train_mcc
value: [0.78485761 0.79151169 0.79209132 0.85451608 0.77632088 0.78285538
 0.75529076 0.75529076 0.78851732 0.80529218]

mean value: 0.7886543984062245

key: test_accuracy
value: [0.82258065 0.87096774 0.90322581 0.85483871 0.87096774 0.85483871
 0.85483871 0.83870968 0.80327869 0.83606557]

mean value: 0.8510312004230566

key: train_accuracy
value: [0.89028777 0.89388489 0.89388489 0.92625899 0.88489209 0.88848921
 0.87410072 0.87410072 0.89228007 0.90125673]

mean value: 0.8919436084884337

key: test_fscore
value: [0.83076923 0.88235294 0.90625    0.85245902 0.87878788 0.86956522
 0.86567164 0.85714286 0.8125     0.85294118]

mean value: 0.8608439959922817

key: train_fscore
value: [0.8957265  0.89879931 0.8991453  0.92869565 0.89189189 0.89491525
 0.88215488 0.88215488 0.89761092 0.90500864]

mean value: 0.8976103228458596

key: test_precision
value: [0.79411765 0.81081081 0.87878788 0.86666667 0.82857143 0.78947368
 0.80555556 0.76923077 0.76470588 0.78378378]

mean value: 0.8091704107029185

key: train_precision
value: [0.8534202  0.85901639 0.85667752 0.8989899  0.84076433 0.84615385
 0.82911392 0.82911392 0.85667752 0.87043189]

mean value: 0.8540359455885207

key: test_recall
value: [0.87096774 0.96774194 0.93548387 0.83870968 0.93548387 0.96774194
 0.93548387 0.96774194 0.86666667 0.93548387]

mean value: 0.9221505376344086

key: train_recall
value: [0.94244604 0.94244604 0.94604317 0.96043165 0.94964029 0.94964029
 0.94244604 0.94244604 0.94265233 0.94244604]

mean value: 0.9460637941259895

key: test_roc_auc
value: [0.82258065 0.87096774 0.90322581 0.85483871 0.87096774 0.85483871
 0.85483871 0.83870968 0.80430108 0.8344086 ]

mean value: 0.8509677419354839

key: train_roc_auc
value: [0.89028777 0.89388489 0.89388489 0.92625899 0.88489209 0.88848921
 0.87410072 0.87410072 0.89218947 0.90133055]

mean value: 0.8919419303267063

key: test_jcc
value: [0.71052632 0.78947368 0.82857143 0.74285714 0.78378378 0.76923077
 0.76315789 0.75       0.68421053 0.74358974]

mean value: 0.7565401289085499

key: train_jcc
value: [0.81114551 0.81619938 0.81677019 0.86688312 0.80487805 0.80981595
 0.78915663 0.78915663 0.81424149 0.82649842]

mean value: 0.8144745352495302

MCC on Blind test: 0.26

Accuracy on Blind test: 0.5

Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
  warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MLPClassifier(max_iter=500, random_state=42))])

key: fit_time
value: [1.6481297  1.49946976 1.67077136 1.65997696 1.63008213 1.52724123
 1.67578554 1.65596056 1.49459696 1.68944907]

mean value: 1.6151463270187378

key: score_time
value: [0.01430917 0.01388526 0.01319432 0.01351166 0.01167202 0.01358342
 0.01357841 0.01354527 0.01401711 0.01371384]

mean value: 0.01350104808807373

key: test_mcc
value: [0.96824584 0.96824584 0.93548387 0.7190925  0.90369611 0.93743687
 1.         1.         0.83655914 1.        ]

mean value: 0.9268760160039228

key: train_mcc
value: [0.99280576 0.99283145 0.99640932 1.         0.99283145 0.99283145
 0.99283145 0.99283145 0.99284434 0.98923428]

mean value: 0.9935450945650737

key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 0.85483871 0.9516129  0.96774194
 1.         1.         0.91803279 1.        ]

mean value: 0.9627710206240084

key: train_accuracy
value: [0.99640288 0.99640288 0.99820144 1.         0.99640288 0.99640288
 0.99640288 0.99640288 0.99640934 0.994614  ]

mean value: 0.9967642044353745

key: test_fscore
value: [0.98360656 0.98360656 0.96774194 0.84210526 0.95081967 0.96875
 1.         1.         0.91803279 1.        ]

mean value: 0.9614662772412258

key: train_fscore
value: [0.99640288 0.99638989 0.9981982  1.         0.99638989 0.99638989
 0.99638989 0.99638989 0.99640288 0.99459459]

mean value: 0.9967548006672231

key: test_precision
value: [1.         1.         0.96774194 0.92307692 0.96666667 0.93939394
 1.         1.         0.90322581 1.        ]

mean value: 0.9700105271073013

key: train_precision
value: [0.99640288 1.         1.         1.         1.         1.
 1.         1.         1.         0.99638989]

mean value: 0.9992792769394593

key: test_recall
value: [0.96774194 0.96774194 0.96774194 0.77419355 0.93548387 1.
 1.         1.         0.93333333 1.        ]

mean value: 0.9546236559139785

key: train_recall
value: [0.99640288 0.99280576 0.99640288 1.         0.99280576 0.99280576
 0.99280576 0.99280576 0.99283154 0.99280576]

mean value: 0.9942471828988423

key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 0.85483871 0.9516129  0.96774194
 1.         1.         0.91827957 1.        ]

mean value: 0.9627956989247312

key: train_roc_auc
value: [0.99640288 0.99640288 0.99820144 1.         0.99640288 0.99640288
 0.99640288 0.99640288 0.99641577 0.99461076]

mean value: 0.9967645238647792

key: test_jcc
value: [0.96774194 0.96774194 0.9375     0.72727273 0.90625    0.93939394
 1.         1.         0.84848485 1.        ]

mean value: 0.9294385386119257

key: train_jcc
value: [0.99283154 0.99280576 0.99640288 1.         0.99280576 0.99280576
 0.99280576 0.99280576 0.99283154 0.98924731]

mean value: 0.9935342048941492

MCC on Blind test: 0.09

Accuracy on Blind test: 0.24

Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', DecisionTreeClassifier(random_state=42))])

key: fit_time
value: [0.01340103 0.01229239 0.00977087 0.00982738 0.00973988 0.01050377
 0.00967884 0.01013613 0.01014376 0.01027131]

mean value: 0.010576534271240234

key: score_time
value: [0.01074123 0.00902033 0.00799775 0.00793815 0.00800681 0.00789976
 0.00837636 0.00792694 0.00824547 0.00833321]

mean value: 0.008448600769042969

key: test_mcc
value: [0.90748521 0.96824584 0.96824584 1.         0.93743687 0.93548387
 0.93743687 0.93743687 0.9344086  0.96774194]

mean value: 0.9493921894362165

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.9516129  0.98387097 0.98387097 1.         0.96774194 0.96774194
 0.96774194 0.96774194 0.96721311 0.98360656]

mean value: 0.9741142252776309

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.94915254 0.98360656 0.98412698 1.         0.96666667 0.96774194
 0.96666667 0.96666667 0.96666667 0.98360656]

mean value: 0.9734901243404501

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.96875    1.         1.         0.96774194
 1.         1.         0.96666667 1.        ]

mean value: 0.9903158602150538

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.90322581 0.96774194 1.         1.         0.93548387 0.96774194
 0.93548387 0.93548387 0.96666667 0.96774194]

mean value: 0.9579569892473119

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.9516129  0.98387097 0.98387097 1.         0.96774194 0.96774194
 0.96774194 0.96774194 0.9672043  0.98387097]

mean value: 0.9741397849462365

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.90322581 0.96774194 0.96875    1.         0.93548387 0.9375
 0.93548387 0.93548387 0.93548387 0.96774194]

mean value: 0.9486895161290323

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.01

Accuracy on Blind test: 0.2

Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreesClassifier(random_state=42))])

key: fit_time
value: [0.10707998 0.10903525 0.10817385 0.10511184 0.10628986 0.10499215
 0.10362315 0.10446763 0.10430741 0.10113478]

mean value: 0.10542159080505371

key: score_time
value: [0.01860476 0.01862955 0.01860476 0.01870513 0.01833129 0.01816988
 0.01843429 0.01767302 0.01715016 0.01741219]

mean value: 0.01817150115966797

key: test_mcc
value: [0.93548387 1.         0.93548387 0.87831007 0.90369611 0.93743687
 1.         0.96824584 0.90215054 0.93635873]

mean value: 0.9397165895399419

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.96774194 1.         0.96774194 0.93548387 0.9516129  0.96774194
 1.         0.98387097 0.95081967 0.96721311]

mean value: 0.9692226335272343

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.96774194 1.         0.96774194 0.93103448 0.95081967 0.96875
 1.         0.98412698 0.95081967 0.96875   ]

mean value: 0.9689784682115642

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96774194 1.         0.96774194 1.         0.96666667 0.93939394
 1.         0.96875    0.93548387 0.93939394]

mean value: 0.9685172287390029

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96774194 1.         0.96774194 0.87096774 0.93548387 1.
 1.         1.         0.96666667 1.        ]

mean value: 0.9708602150537634

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.96774194 1.         0.96774194 0.93548387 0.9516129  0.96774194
 1.         0.98387097 0.95107527 0.96666667]

mean value: 0.9691935483870968

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.9375     1.         0.9375     0.87096774 0.90625    0.93939394
 1.         0.96875    0.90625    0.93939394]

mean value: 0.9406005620723363

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.2

Accuracy on Blind test: 0.36

Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', ExtraTreeClassifier(random_state=42))])

key: fit_time
value: [0.00863886 0.00797391 0.0083859  0.00775075 0.00766373 0.0079093
 0.00830865 0.00843334 0.00793123 0.00765133]

mean value: 0.008064699172973634

key: score_time
value: [0.00806904 0.00858569 0.00859904 0.00799298 0.00799918 0.00797725
 0.00856709 0.00818801 0.00789118 0.00795794]

mean value: 0.008182740211486817

key: test_mcc
value: [0.75623534 0.87831007 0.87278605 0.83914639 0.84266484 0.64820372
 0.74348441 0.90748521 0.77072165 0.83655914]

mean value: 0.8095596827565272

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.87096774 0.93548387 0.93548387 0.91935484 0.91935484 0.82258065
 0.87096774 0.9516129  0.8852459  0.91803279]

mean value: 0.9029085140137494

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.85714286 0.93103448 0.93333333 0.92063492 0.91525424 0.81355932
 0.86666667 0.94915254 0.88135593 0.91803279]

mean value: 0.8986167081319949

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [0.96       1.         0.96551724 0.90625    0.96428571 0.85714286
 0.89655172 1.         0.89655172 0.93333333]

mean value: 0.9379632594417078

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.77419355 0.87096774 0.90322581 0.93548387 0.87096774 0.77419355
 0.83870968 0.90322581 0.86666667 0.90322581]

mean value: 0.8640860215053763

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.87096774 0.93548387 0.93548387 0.91935484 0.91935484 0.82258065
 0.87096774 0.9516129  0.88494624 0.91827957]

mean value: 0.9029032258064517

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.75       0.87096774 0.875      0.85294118 0.84375    0.68571429
 0.76470588 0.90322581 0.78787879 0.84848485]

mean value: 0.8182668529288548

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.1

Accuracy on Blind test: 0.26

Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(n_estimators=1000, random_state=42))])

key: fit_time
value: [1.34418821 1.34185529 1.3479538  1.36781883 1.42743945 1.3655982
 1.38340139 1.37809682 1.39602447 1.33490944]

mean value: 1.3687285900115966

key: score_time
value: [0.09742594 0.09719825 0.09524751 0.09951448 0.09094286 0.0994525
 0.09763288 0.09727025 0.09892535 0.09526753]

mean value: 0.09688775539398194

key: test_mcc
value: [0.96824584 0.96824584 0.93548387 0.96824584 0.96824584 0.96824584
 1.         1.         0.90215054 1.        ]

mean value: 0.9678863591361422

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097
 1.         1.         0.95081967 1.        ]

mean value: 0.9837916446324696

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98360656 0.98360656 0.96774194 0.98360656 0.98412698 0.98412698
 1.         1.         0.95081967 1.        ]

mean value: 0.9837635248000134

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.96774194 1.         0.96875    0.96875
 1.         1.         0.93548387 1.        ]

mean value: 0.9840725806451613

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96774194 0.96774194 0.96774194 0.96774194 1.         1.
 1.         1.         0.96666667 1.        ]

mean value: 0.983763440860215

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097
 1.         1.         0.95107527 1.        ]

mean value: 0.9838172043010753

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96774194 0.96774194 0.9375     0.96774194 0.96875    0.96875
 1.         1.         0.90625    1.        ]

mean value: 0.9684475806451613

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.08

Accuracy on Blind test: 0.19

Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
  warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                                        n_estimators=1000, n_jobs=10,
                                        oob_score=True, random_state=42))])

key: fit_time
value: [0.99160314 0.89481354 0.90568662 0.9475925  0.90248585 0.94201088
 0.93975306 0.95513034 0.88768649 0.92357564]

mean value: 0.9290338039398194

key: score_time
value: [0.15050101 0.24627447 0.24356008 0.27248359 0.27095199 0.25157189
 0.20301151 0.27629042 0.26423383 0.23688626]

mean value: 0.24157650470733644

key: test_mcc
value: [0.93548387 0.96824584 0.93548387 0.96824584 0.90748521 0.96824584
 1.         0.96824584 0.87082935 0.96770777]

mean value: 0.9489973426546622

key: train_mcc
value: [0.96425338 0.96058703 0.96425338 0.96058703 0.96412858 0.97132357
 0.95353974 0.96412858 0.96783888 0.96065866]

mean value: 0.9631298857914714

key: test_accuracy
value: [0.96774194 0.98387097 0.96774194 0.98387097 0.9516129  0.98387097
 1.         0.98387097 0.93442623 0.98360656]

mean value: 0.9740613432046537

key: train_accuracy
value: [0.98201439 0.98021583 0.98201439 0.98021583 0.98201439 0.98561151
 0.97661871 0.98201439 0.98384201 0.98025135]

mean value: 0.9814812781731527

key: test_fscore
value: [0.96774194 0.98360656 0.96774194 0.98360656 0.95384615 0.98412698
 1.         0.98360656 0.93548387 0.98412698]

mean value: 0.9743887536166753

key: train_fscore
value: [0.98220641 0.98039216 0.98220641 0.98039216 0.98214286 0.98571429
 0.97690941 0.98214286 0.98401421 0.98039216]

mean value: 0.9816512905421962

key: test_precision
value: [0.96774194 1.         0.96774194 1.         0.91176471 0.96875
 1.         1.         0.90625    0.96875   ]

mean value: 0.9690998576850095

key: train_precision
value: [0.97183099 0.97173145 0.97183099 0.97173145 0.9751773  0.9787234
 0.96491228 0.9751773  0.97535211 0.97173145]

mean value: 0.9728198725682946

key: test_recall
value: [0.96774194 0.96774194 0.96774194 0.96774194 1.         1.
 1.         0.96774194 0.96666667 1.        ]

mean value: 0.9805376344086022

key: train_recall
value: [0.99280576 0.98920863 0.99280576 0.98920863 0.98920863 0.99280576
 0.98920863 0.98920863 0.99283154 0.98920863]

mean value: 0.9906500605966839

key: test_roc_auc
value: [0.96774194 0.98387097 0.96774194 0.98387097 0.9516129  0.98387097
 1.         0.98387097 0.93494624 0.98333333]

mean value: 0.9740860215053764

key: train_roc_auc
value: [0.98201439 0.98021583 0.98201439 0.98021583 0.98201439 0.98561151
 0.97661871 0.98201439 0.98382584 0.9802674 ]

mean value: 0.9814812665996235

key: test_jcc
value: [0.9375     0.96774194 0.9375     0.96774194 0.91176471 0.96875
 1.         0.96774194 0.87878788 0.96875   ]

mean value: 0.9506278391121845

key: train_jcc
value: [0.96503497 0.96153846 0.96503497 0.96153846 0.96491228 0.97183099
 0.95486111 0.96491228 0.96853147 0.96153846]

mean value: 0.9639733441646896

MCC on Blind test: 0.1

Accuracy on Blind test: 0.23

Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', BernoulliNB())])

key: fit_time
value: [0.01971221 0.00761104 0.00768089 0.00756931 0.00756836 0.00765538
 0.00759244 0.00763845 0.00757504 0.00766015]

mean value: 0.008826327323913575

key: score_time
value: [0.01263118 0.00788474 0.00787878 0.00782609 0.00785947 0.00789833
 0.00783944 0.00784731 0.00786543 0.00787163]

mean value: 0.008340239524841309

key: test_mcc
value: [0.51639778 0.56761348 0.61290323 0.65372045 0.74348441 0.5809475
 0.58834841 0.7130241  0.58264312 0.54086022]

mean value: 0.6099942679846233

key: train_mcc
value: [0.62249953 0.6079176  0.63414469 0.60794907 0.59713776 0.61543051
 0.64482423 0.62249953 0.6375268  0.6122178 ]

mean value: 0.620214750789007

key: test_accuracy
value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258
 0.79032258 0.85483871 0.78688525 0.7704918 ]

mean value: 0.8025118984664199

key: train_accuracy
value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396
 0.82194245 0.81115108 0.81867145 0.80610413]

mean value: 0.8099595727367837

key: test_fscore
value: [0.75409836 0.74074074 0.80645161 0.80701754 0.875      0.79365079
 0.80597015 0.86153846 0.8        0.77419355]

mean value: 0.8018661210989436

key: train_fscore
value: [0.80874317 0.8036036  0.82167832 0.80500894 0.79928315 0.80438757
 0.82661996 0.81349911 0.82123894 0.80505415]

mean value: 0.8109116928454192

key: test_precision
value: [0.76666667 0.86956522 0.80645161 0.88461538 0.84848485 0.78125
 0.75       0.82352941 0.74285714 0.77419355]

mean value: 0.8047613833070375

key: train_precision
value: [0.81918819 0.80505415 0.79931973 0.80071174 0.79642857 0.81784387
 0.80546075 0.80350877 0.81118881 0.80797101]

mean value: 0.8066675601234072

key: test_recall
value: [0.74193548 0.64516129 0.80645161 0.74193548 0.90322581 0.80645161
 0.87096774 0.90322581 0.86666667 0.77419355]

mean value: 0.8060215053763441

key: train_recall
value: [0.79856115 0.80215827 0.84532374 0.80935252 0.80215827 0.79136691
 0.84892086 0.82374101 0.83154122 0.80215827]

mean value: 0.8155282225832238

key: test_roc_auc
value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258
 0.79032258 0.85483871 0.78817204 0.77043011]

mean value: 0.8026344086021505

key: train_roc_auc
value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396
 0.82194245 0.81115108 0.81864831 0.80609706]

mean value: 0.8099565508883215

key: test_jcc
value: [0.60526316 0.58823529 0.67567568 0.67647059 0.77777778 0.65789474
 0.675      0.75675676 0.66666667 0.63157895]

mean value: 0.6711319601335082

key: train_jcc
value: [0.67889908 0.67168675 0.69732938 0.67365269 0.66567164 0.67278287
 0.70447761 0.68562874 0.6966967  0.67371601]

mean value: 0.6820541480667476

MCC on Blind test: 0.18

Accuracy on Blind test: 0.52

Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None,
              enable_categorical=False, gamma=None, gpu_id=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=None, tree_method=None,
              use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
                               interaction_constraints=None, learning_rate=None,
                               max_delta_step=None, max_depth=None,
                               min_child_weight=None, missing=nan,
                               monotone_constraints=None, n_estimators=100,
                               n_jobs=None, num_parallel_tree=None,
                               predictor=None, random_state=42, reg_alpha=None,
                               reg_lambda=None, scale_pos_weight=None,
                               subsample=None, tree_method=None,
                               use_label_encoder=False,
                               validate_parameters=None, verbosity=0))])

key: fit_time
value: [0.21862555 0.04956889 0.04996634 0.05186462 0.05506182 0.06219912
 0.06107974 0.06241131 0.05737829 0.05969238]

mean value: 0.07278480529785156

key: score_time
value: [0.01031947 0.00971913 0.00969386 0.00995827 0.01020288 0.00984311
 0.0096755  0.00973344 0.0099237  0.00953674]

mean value: 0.009860610961914063

key: test_mcc
value: [0.96824584 0.96824584 0.93548387 0.96824584 0.96824584 0.96824584
 0.96824584 0.96824584 0.90215054 1.        ]

mean value: 0.9615355264465131

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097
 0.98387097 0.98387097 0.95081967 1.        ]

mean value: 0.9805658381808567

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98360656 0.98360656 0.96774194 0.98360656 0.98412698 0.98412698
 0.98360656 0.98360656 0.95081967 1.        ]

mean value: 0.9804848362754233

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.96774194 1.         0.96875    0.96875
 1.         1.         0.93548387 1.        ]

mean value: 0.9840725806451613

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96774194 0.96774194 0.96774194 0.96774194 1.         1.
 0.96774194 0.96774194 0.96666667 1.        ]

mean value: 0.9773118279569892

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097
 0.98387097 0.98387097 0.95107527 1.        ]

mean value: 0.9805913978494624

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96774194 0.96774194 0.9375     0.96774194 0.96875    0.96875
 0.96774194 0.96774194 0.90625    1.        ]

mean value: 0.9619959677419355

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.06

Accuracy on Blind test: 0.2

Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', LinearDiscriminantAnalysis())])

key: fit_time
value: [0.01578832 0.04168701 0.05872059 0.01797581 0.01809096 0.03955126
 0.04262829 0.01832151 0.01880884 0.0180583 ]

mean value: 0.028963088989257812

key: score_time
value: [0.01038313 0.01973009 0.01196766 0.01065159 0.01061916 0.02021313
 0.02139711 0.01056767 0.01115489 0.01077628]

mean value: 0.013746070861816406

key: test_mcc
value: [0.93548387 1.         0.93548387 0.87831007 0.87831007 0.96824584
 0.93743687 0.96824584 0.83655914 0.93635873]

mean value: 0.9274434285640426

key: train_mcc
value: [0.94283651 0.9393413  0.94305636 0.93563929 0.95353974 0.9393413
 0.93914669 0.93214329 0.94994909 0.93925798]

mean value: 0.941425155755879

key: test_accuracy
value: [0.96774194 1.         0.96774194 0.93548387 0.93548387 0.98387097
 0.96774194 0.98387097 0.91803279 0.96721311]

mean value: 0.9627181385510312

key: train_accuracy
value: [0.97122302 0.96942446 0.97122302 0.9676259  0.97661871 0.96942446
 0.96942446 0.96582734 0.97486535 0.96947935]

mean value: 0.9705136070676672

key: test_fscore
value: [0.96774194 1.         0.96774194 0.93103448 0.93939394 0.98412698
 0.96875    0.98360656 0.91803279 0.96875   ]

mean value: 0.9629178621509581

key: train_fscore
value: [0.97163121 0.9699115  0.97173145 0.96808511 0.97690941 0.9699115
 0.96980462 0.96637168 0.9751773  0.96980462]

mean value: 0.9709338406138824

key: test_precision
value: [0.96774194 1.         0.96774194 1.         0.88571429 0.96875
 0.93939394 1.         0.90322581 0.93939394]

mean value: 0.9571961841921519

key: train_precision
value: [0.95804196 0.95470383 0.95486111 0.95454545 0.96491228 0.95470383
 0.95789474 0.95121951 0.96491228 0.95789474]

mean value: 0.9573689736486591

key: test_recall
value: [0.96774194 1.         0.96774194 0.87096774 1.         1.
 1.         0.96774194 0.93333333 1.        ]

mean value: 0.970752688172043

key: train_recall
value: [0.98561151 0.98561151 0.98920863 0.98201439 0.98920863 0.98561151
 0.98201439 0.98201439 0.98566308 0.98201439]

mean value: 0.9848972434955261

key: test_roc_auc
value: [0.96774194 1.         0.96774194 0.93548387 0.93548387 0.98387097
 0.96774194 0.98387097 0.91827957 0.96666667]

mean value: 0.9626881720430107

key: train_roc_auc
value: [0.97122302 0.96942446 0.97122302 0.9676259  0.97661871 0.96942446
 0.96942446 0.96582734 0.97484593 0.96950182]

mean value: 0.970513911451484

key: test_jcc
value: [0.9375     1.         0.9375     0.87096774 0.88571429 0.96875
 0.93939394 0.96774194 0.84848485 0.93939394]

mean value: 0.9295446690406368

key: train_jcc
value: [0.94482759 0.94158076 0.94501718 0.93814433 0.95486111 0.94158076
 0.94137931 0.93493151 0.95155709 0.94137931]

mean value: 0.9435258942337567

MCC on Blind test: 0.14

Accuracy on Blind test: 0.35

Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', MultinomialNB())])

key: fit_time
value: [0.01899743 0.00779724 0.00781965 0.00752091 0.00765944 0.00745153
 0.00752687 0.00762939 0.00754023 0.00756288]

mean value: 0.008750557899475098

key: score_time
value: [0.008394   0.00820088 0.00782681 0.00797677 0.00793958 0.00783062
 0.00781059 0.0078342  0.00790739 0.00793242]

mean value: 0.007965326309204102

key: test_mcc
value: [0.61807005 0.74819006 0.67883359 0.64549722 0.67883359 0.63439154
 0.63439154 0.67419986 0.54654832 0.64708149]

mean value: 0.6506037256013296

key: train_mcc
value: [0.66814183 0.65361701 0.66955589 0.67282515 0.64923736 0.67144111
 0.67540424 0.6622781  0.67590132 0.66881107]

mean value: 0.6667213081476084

key: test_accuracy
value: [0.80645161 0.87096774 0.83870968 0.82258065 0.83870968 0.80645161
 0.80645161 0.82258065 0.7704918  0.81967213]

mean value: 0.8203067160232681

key: train_accuracy
value: [0.83093525 0.82374101 0.83093525 0.83273381 0.82014388 0.83273381
 0.83453237 0.82733813 0.83482944 0.83123878]

mean value: 0.8299161747801042

key: test_fscore
value: [0.81818182 0.87878788 0.84375    0.82539683 0.84375    0.82857143
 0.82857143 0.84507042 0.78125    0.8358209 ]

mean value: 0.8329150697566978

key: train_fscore
value: [0.84175084 0.83501684 0.84280936 0.84422111 0.83388704 0.84317032
 0.84511785 0.83946488 0.84563758 0.84175084]

mean value: 0.8412826664142349

key: test_precision
value: [0.77142857 0.82857143 0.81818182 0.8125     0.81818182 0.74358974
 0.74358974 0.75       0.73529412 0.77777778]

mean value: 0.779911501896796

key: train_precision
value: [0.79113924 0.78481013 0.7875     0.78996865 0.77469136 0.79365079
 0.7943038  0.784375   0.79495268 0.79113924]

mean value: 0.7886530890164406

key: test_recall
value: [0.87096774 0.93548387 0.87096774 0.83870968 0.87096774 0.93548387
 0.93548387 0.96774194 0.83333333 0.90322581]

mean value: 0.896236559139785

key: train_recall
value: [0.89928058 0.89208633 0.90647482 0.90647482 0.9028777  0.89928058
 0.9028777  0.9028777  0.90322581 0.89928058]

mean value: 0.9014736597818519

key: test_roc_auc
value: [0.80645161 0.87096774 0.83870968 0.82258065 0.83870968 0.80645161
 0.80645161 0.82258065 0.77150538 0.81827957]

mean value: 0.8202688172043011

key: train_roc_auc
value: [0.83093525 0.82374101 0.83093525 0.83273381 0.82014388 0.83273381
 0.83453237 0.82733813 0.83470643 0.83136072]

mean value: 0.8299160671462831

key: test_jcc
value: [0.69230769 0.78378378 0.72972973 0.7027027  0.72972973 0.70731707
 0.70731707 0.73170732 0.64102564 0.71794872]

mean value: 0.7143569460642631

key: train_jcc
value: [0.72674419 0.71676301 0.7283237  0.73043478 0.71509972 0.72886297
 0.73177843 0.72334294 0.73255814 0.72674419]

mean value: 0.7260652053436807

MCC on Blind test: 0.21

Accuracy on Blind test: 0.5

Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 PassiveAggressiveClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01070261 0.0129571  0.01364112 0.01318789 0.01269341 0.01534224
 0.01468229 0.01412392 0.01440811 0.0143919 ]

mean value: 0.013613057136535645

key: score_time
value: [0.008075   0.01009893 0.00991964 0.01034665 0.01041341 0.01067472
 0.0105691  0.01087594 0.01076126 0.01034617]

mean value: 0.01020808219909668

key: test_mcc
value: [0.82199494 0.93743687 0.93548387 0.81325006 0.87831007 0.74161985
 0.90748521 0.83914639 0.72318666 0.30374645]

mean value: 0.7901660359762814

key: train_mcc
value: [0.87166214 0.92172241 0.94266562 0.92172241 0.91860435 0.69376766
 0.94305636 0.93238486 0.88634645 0.2887174 ]

mean value: 0.8320649673376139

key: test_accuracy
value: [0.90322581 0.96774194 0.96774194 0.90322581 0.93548387 0.85483871
 0.9516129  0.91935484 0.85245902 0.59016393]

mean value: 0.8845848757271285

key: train_accuracy
value: [0.93345324 0.96043165 0.97122302 0.96043165 0.95863309 0.82733813
 0.97122302 0.96582734 0.94075404 0.57630162]

mean value: 0.9065616806375366

key: test_fscore
value: [0.89285714 0.96875    0.96774194 0.89655172 0.93939394 0.87323944
 0.95384615 0.92063492 0.83018868 0.71264368]

mean value: 0.895584761037988

key: train_fscore
value: [0.92979127 0.96126761 0.97153025 0.96126761 0.95971979 0.85185185
 0.97173145 0.9664903  0.93761815 0.7020202 ]

mean value: 0.9213288471474509

key: test_precision
value: [1.         0.93939394 0.96774194 0.96296296 0.88571429 0.775
 0.91176471 0.90625    0.95652174 0.55357143]

mean value: 0.8858920997139276

key: train_precision
value: [0.98393574 0.94137931 0.96126761 0.94137931 0.93515358 0.74594595
 0.95486111 0.94809689 0.992      0.54085603]

mean value: 0.8944875526911704

key: test_recall
value: [0.80645161 1.         0.96774194 0.83870968 1.         1.
 1.         0.93548387 0.73333333 1.        ]

mean value: 0.9281720430107527

key: train_recall
value: [0.88129496 0.98201439 0.98201439 0.98201439 0.98561151 0.99280576
 0.98920863 0.98561151 0.88888889 1.        ]

mean value: 0.9669464428457234

key: test_roc_auc
value: [0.90322581 0.96774194 0.96774194 0.90322581 0.93548387 0.85483871
 0.9516129  0.91935484 0.85053763 0.58333333]

mean value: 0.8837096774193549

key: train_roc_auc
value: [0.93345324 0.96043165 0.97122302 0.96043165 0.95863309 0.82733813
 0.97122302 0.96582734 0.94084732 0.57706093]

mean value: 0.9066469405121065

key: test_jcc
value: [0.80645161 0.93939394 0.9375     0.8125     0.88571429 0.775
 0.91176471 0.85294118 0.70967742 0.55357143]

mean value: 0.818451456829066

key: train_jcc
value: [0.86879433 0.92542373 0.94463668 0.92542373 0.92255892 0.74193548
 0.94501718 0.93515358 0.88256228 0.54085603]

mean value: 0.8632361942955643

MCC on Blind test: 0.1

Accuracy on Blind test: 0.29

Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', SGDClassifier(n_jobs=10, random_state=42))])

key: fit_time
value: [0.01686311 0.01279736 0.01273036 0.01324439 0.01300955 0.01324821
 0.01373839 0.01282573 0.01237702 0.01400542]

mean value: 0.013483953475952149

key: score_time
value: [0.01079369 0.01044965 0.01073813 0.0103972  0.01031804 0.01030827
 0.01035023 0.01034307 0.01035166 0.01030016]

mean value: 0.010435009002685547

key: test_mcc
value: [0.87831007 0.74161985 0.78446454 0.71567809 0.79471941 0.93548387
 0.96824584 0.84983659 0.77072165 0.90586325]

mean value: 0.8344943153997917

key: train_mcc
value: [0.92518498 0.76865678 0.81406658 0.92923662 0.90265061 0.89965316
 0.92844206 0.89154571 0.92828039 0.93998809]

mean value: 0.8927704971400476

key: test_accuracy
value: [0.93548387 0.85483871 0.88709677 0.83870968 0.88709677 0.96774194
 0.98387097 0.91935484 0.8852459  0.95081967]

mean value: 0.9110259122157589

key: train_accuracy
value: [0.96223022 0.87230216 0.89928058 0.96402878 0.94964029 0.94964029
 0.96402878 0.9442446  0.96409336 0.96947935]

mean value: 0.9438968394404763

key: test_fscore
value: [0.93103448 0.83018868 0.89552239 0.80769231 0.89855072 0.96774194
 0.98360656 0.9122807  0.88135593 0.95384615]

mean value: 0.9061819863058443

key: train_fscore
value: [0.96146789 0.85420945 0.90819672 0.96309963 0.95172414 0.94890511
 0.96350365 0.94183865 0.96441281 0.97012302]

mean value: 0.9427481068247102

key: test_precision
value: [1.         1.         0.83333333 1.         0.81578947 0.96774194
 1.         1.         0.89655172 0.91176471]

mean value: 0.9425181172521699

key: train_precision
value: [0.98127341 0.99521531 0.83433735 0.98863636 0.91390728 0.96296296
 0.97777778 0.98431373 0.95759717 0.94845361]

mean value: 0.9544474964669887

key: test_recall
value: [0.87096774 0.70967742 0.96774194 0.67741935 1.         0.96774194
 0.96774194 0.83870968 0.86666667 1.        ]

mean value: 0.8866666666666667

key: train_recall
value: [0.94244604 0.74820144 0.99640288 0.93884892 0.99280576 0.9352518
 0.94964029 0.9028777  0.97132616 0.99280576]

mean value: 0.937060674041412

key: test_roc_auc
value: [0.93548387 0.85483871 0.88709677 0.83870968 0.88709677 0.96774194
 0.98387097 0.91935484 0.88494624 0.95      ]

mean value: 0.9109139784946236

key: train_roc_auc
value: [0.96223022 0.87230216 0.89928058 0.96402878 0.94964029 0.94964029
 0.96402878 0.9442446  0.96408035 0.96952116]

mean value: 0.9438997189345298

key: test_jcc
value: [0.87096774 0.70967742 0.81081081 0.67741935 0.81578947 0.9375
 0.96774194 0.83870968 0.78787879 0.91176471]

mean value: 0.832825990728842

key: train_jcc
value: [0.92579505 0.74551971 0.83183183 0.92882562 0.90789474 0.90277778
 0.92957746 0.89007092 0.93127148 0.94197952]

mean value: 0.8935544122114777

MCC on Blind test: 0.1

Accuracy on Blind test: 0.4

Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', AdaBoostClassifier(random_state=42))])

key: fit_time
value: [0.10854602 0.09391761 0.09340096 0.09336042 0.09349442 0.0939045
 0.09685636 0.09437943 0.09400725 0.09450531]

mean value: 0.09563722610473632

key: score_time
value: [0.01416063 0.01400757 0.01419139 0.0142355  0.01414442 0.01419091
 0.01533508 0.01431847 0.01418138 0.0142591 ]

mean value: 0.014302444458007813

key: test_mcc
value: [0.96824584 1.         0.96824584 0.96824584 0.96824584 0.96824584
 1.         0.96824584 0.90215054 1.        ]

mean value: 0.9711625556945535

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 1.         0.98387097 0.98387097 0.98387097 0.98387097
 1.         0.98387097 0.95081967 1.        ]

mean value: 0.985404547858276

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98360656 1.         0.98412698 0.98360656 0.98412698 0.98412698
 1.         0.98360656 0.95081967 1.        ]

mean value: 0.9854020296643248

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.96875    1.         0.96875    0.96875
 1.         1.         0.93548387 1.        ]

mean value: 0.9841733870967742

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96774194 1.         1.         0.96774194 1.         1.
 1.         0.96774194 0.96666667 1.        ]

mean value: 0.986989247311828

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 1.         0.98387097 0.98387097 0.98387097 0.98387097
 1.         0.98387097 0.95107527 1.        ]

mean value: 0.9854301075268818

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96774194 1.         0.96875    0.96774194 0.96875    0.96875
 1.         0.96774194 0.90625    1.        ]

mean value: 0.9715725806451613

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.09

Accuracy on Blind test: 0.21

Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
  warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
  oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model',
                 BaggingClassifier(n_jobs=10, oob_score=True,
                                   random_state=42))])

key: fit_time
value: [0.03534484 0.0434587  0.05311441 0.03592443 0.03713274 0.05070066
 0.05334473 0.05310988 0.04448533 0.03317356]

mean value: 0.04397892951965332

key: score_time
value: [0.022789   0.0229876  0.02233076 0.01710248 0.01946139 0.03598452
 0.02479911 0.02968454 0.01835775 0.03061008]

mean value: 0.024410724639892578

key: test_mcc
value: [0.93743687 0.93743687 0.93548387 0.93743687 0.93548387 0.96824584
 0.96824584 0.87831007 0.90215054 0.96774194]

mean value: 0.9367972553494428

key: train_mcc
value: [1.         0.99640932 0.99640932 0.99640932 1.         1.
 0.99283145 1.         0.99641572 0.99641572]

mean value: 0.9974890870152905

key: test_accuracy
value: [0.96774194 0.96774194 0.96774194 0.96774194 0.96774194 0.98387097
 0.98387097 0.93548387 0.95081967 0.98360656]

mean value: 0.9676361713379165

key: train_accuracy
value: [1.         0.99820144 0.99820144 0.99820144 1.         1.
 0.99640288 1.         0.99820467 0.99820467]

mean value: 0.9987416529971714

key: test_fscore
value: [0.96666667 0.96666667 0.96774194 0.96666667 0.96774194 0.98412698
 0.98360656 0.93103448 0.95081967 0.98360656]

mean value: 0.9668678124738592

key: train_fscore
value: [1.         0.9981982  0.9981982  0.9981982  1.         1.
 0.99638989 1.         0.99821109 0.9981982 ]

mean value: 0.9987393775723891

key: test_precision
value: [1.         1.         0.96774194 1.         0.96774194 0.96875
 1.         1.         0.93548387 1.        ]

mean value: 0.9839717741935484

key: train_precision
value: [1.         1.         1.         1.         1.         1.
 1.         1.         0.99642857 1.        ]

mean value: 0.9996428571428572

key: test_recall
value: [0.93548387 0.93548387 0.96774194 0.93548387 0.96774194 1.
 0.96774194 0.87096774 0.96666667 0.96774194]

mean value: 0.951505376344086

key: train_recall
value: [1.         0.99640288 0.99640288 0.99640288 1.         1.
 0.99280576 1.         1.         0.99640288]

mean value: 0.9978417266187051

key: test_roc_auc
value: [0.96774194 0.96774194 0.96774194 0.96774194 0.96774194 0.98387097
 0.98387097 0.93548387 0.95107527 0.98387097]

mean value: 0.9676881720430108

key: train_roc_auc
value: [1.         0.99820144 0.99820144 0.99820144 1.         1.
 0.99640288 1.         0.99820144 0.99820144]

mean value: 0.9987410071942446

key: test_jcc
value: [0.93548387 0.93548387 0.9375     0.93548387 0.9375     0.96875
 0.96774194 0.87096774 0.90625    0.96774194]

mean value: 0.9362903225806452

key: train_jcc
value: [1.         0.99640288 0.99640288 0.99640288 1.         1.
 0.99280576 1.         0.99642857 0.99640288]

mean value: 0.9974845837615622

MCC on Blind test: 0.06

Accuracy on Blind test: 0.21

Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GaussianProcessClassifier(random_state=42))])

key: fit_time
value: [0.18122554 0.21741056 0.19656324 0.19563127 0.21628571 0.1628089
 0.20888186 0.19683623 0.13708878 0.13875246]

mean value: 0.18514845371246338

key: score_time
value: [0.02055907 0.02068186 0.02069783 0.02077031 0.02077198 0.01287293
 0.02073574 0.02086687 0.01321125 0.02461028]

mean value: 0.019577813148498536

key: test_mcc
value: [0.67741935 0.74819006 0.74348441 0.69047575 0.80813523 0.87278605
 0.81325006 0.81325006 0.54086022 0.74352218]

mean value: 0.7451373368256522

key: train_mcc
value: [0.87415162 0.87059372 0.86758591 0.89596753 0.88157448 0.87455914
 0.87086426 0.86758591 0.87459701 0.88883589]

mean value: 0.8766315468831808

key: test_accuracy
value: [0.83870968 0.87096774 0.87096774 0.83870968 0.90322581 0.93548387
 0.90322581 0.90322581 0.7704918  0.86885246]

mean value: 0.870386039132734

key: train_accuracy
value: [0.93705036 0.9352518  0.93345324 0.94784173 0.94064748 0.93705036
 0.9352518  0.93345324 0.93716338 0.9443447 ]

mean value: 0.9381508078994614

key: test_fscore
value: [0.83870968 0.87878788 0.875      0.82142857 0.9        0.9375
 0.90909091 0.90909091 0.76666667 0.87878788]

mean value: 0.8715062491272169

key: train_fscore
value: [0.93738819 0.93571429 0.93474427 0.94849023 0.94138544 0.9380531
 0.93617021 0.93474427 0.9380531  0.94474153]

mean value: 0.9389484621579285

key: test_precision
value: [0.83870968 0.82857143 0.84848485 0.92       0.93103448 0.90909091
 0.85714286 0.85714286 0.76666667 0.82857143]

mean value: 0.8585415155848971

key: train_precision
value: [0.93238434 0.92907801 0.91695502 0.93684211 0.92982456 0.92334495
 0.92307692 0.91695502 0.92657343 0.93639576]

mean value: 0.9271430114193007

key: test_recall
value: [0.83870968 0.93548387 0.90322581 0.74193548 0.87096774 0.96774194
 0.96774194 0.96774194 0.76666667 0.93548387]

mean value: 0.8895698924731182

key: train_recall
value: [0.94244604 0.94244604 0.95323741 0.96043165 0.95323741 0.95323741
 0.94964029 0.95323741 0.94982079 0.95323741]

mean value: 0.9510971867667156

key: test_roc_auc
value: [0.83870968 0.87096774 0.87096774 0.83870968 0.90322581 0.93548387
 0.90322581 0.90322581 0.77043011 0.86774194]

mean value: 0.870268817204301

key: train_roc_auc
value: [0.93705036 0.9352518  0.93345324 0.94784173 0.94064748 0.93705036
 0.9352518  0.93345324 0.93714061 0.94436064]

mean value: 0.9381501250612413

key: test_jcc
value: [0.72222222 0.78378378 0.77777778 0.6969697  0.81818182 0.88235294
 0.83333333 0.83333333 0.62162162 0.78378378]

mean value: 0.7753360312183841

key: train_jcc
value: [0.88215488 0.87919463 0.87748344 0.90202703 0.88926174 0.88333333
 0.88       0.87748344 0.88333333 0.89527027]

mean value: 0.884954210937499

MCC on Blind test: 0.22

Accuracy on Blind test: 0.49

Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', GradientBoostingClassifier(random_state=42))])

key: fit_time
value: [0.25665951 0.24086618 0.24189734 0.23880529 0.24180579 0.24213672
 0.24336982 0.24555063 0.24932742 0.25003719]

mean value: 0.2450455904006958

key: score_time
value: [0.00856853 0.0083406  0.00863934 0.00827336 0.00876927 0.00846887
 0.00852108 0.00857925 0.00858474 0.00857282]

mean value: 0.008531785011291504

key: test_mcc
value: [0.96824584 0.96824584 0.93548387 1.         1.         0.96824584
 1.         0.96824584 0.9344086  1.        ]

mean value: 0.9742875819325697

key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_accuracy
value: [0.98387097 0.98387097 0.96774194 1.         1.         0.98387097
 1.         0.98387097 0.96721311 1.        ]

mean value: 0.9870438921205711

key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_fscore
value: [0.98360656 0.98360656 0.96774194 1.         1.         0.98412698
 1.         0.98360656 0.96666667 1.        ]

mean value: 0.986935525840867

key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_precision
value: [1.         1.         0.96774194 1.         1.         0.96875
 1.         1.         0.96666667 1.        ]

mean value: 0.9903158602150538

key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_recall
value: [0.96774194 0.96774194 0.96774194 1.         1.         1.
 1.         0.96774194 0.96666667 1.        ]

mean value: 0.983763440860215

key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_roc_auc
value: [0.98387097 0.98387097 0.96774194 1.         1.         0.98387097
 1.         0.98387097 0.9672043  1.        ]

mean value: 0.9870430107526882

key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

key: test_jcc
value: [0.96774194 0.96774194 0.9375     1.         1.         0.96875
 1.         0.96774194 0.93548387 1.        ]

mean value: 0.9744959677419355

key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean value: 1.0

MCC on Blind test: 0.05

Accuracy on Blind test: 0.19

Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', QuadraticDiscriminantAnalysis())])

key: fit_time
value: [0.01201487 0.01361275 0.01402354 0.01380372 0.01379108 0.02837563
 0.01576805 0.01669693 0.02459884 0.01624608]

mean value: 0.01689314842224121

key: score_time
value: [0.0111146  0.01098752 0.01094055 0.01091433 0.01093793 0.01111317
 0.01172638 0.01128578 0.01110101 0.01107979]

mean value: 0.011120104789733886

key: test_mcc
value: [0.74193548 0.80813523 0.81325006 0.52297636 0.74819006 0.67419986
 0.67883359 0.81325006 0.72516604 0.71375712]

mean value: 0.7239693864680706

key: train_mcc
value: [0.82567165 0.81659431 0.79995316 0.7380124  0.83549358 0.78285538
 0.76623167 0.78683637 0.87297353 0.8490525 ]

mean value: 0.8073674571945186

key: test_accuracy
value: [0.87096774 0.90322581 0.90322581 0.75806452 0.87096774 0.82258065
 0.83870968 0.90322581 0.85245902 0.85245902]

mean value: 0.8575885774722369

key: train_accuracy
value: [0.9118705  0.90827338 0.89748201 0.85971223 0.91546763 0.88848921
 0.88309353 0.89028777 0.93536804 0.92280072]

mean value: 0.9012845020213631

key: test_fscore
value: [0.87096774 0.9        0.89655172 0.73684211 0.87878788 0.84507042
 0.83333333 0.89655172 0.86567164 0.86567164]

mean value: 0.8589448213713017

key: train_fscore
value: [0.90875233 0.90876565 0.89142857 0.84210526 0.91965812 0.89491525
 0.88245931 0.88291747 0.93771626 0.92598967]

mean value: 0.8994707904383525

key: test_precision
value: [0.87096774 0.93103448 0.96296296 0.80769231 0.82857143 0.75
 0.86206897 0.96296296 0.78378378 0.80555556]

mean value: 0.8565600191740348

key: train_precision
value: [0.94208494 0.90391459 0.94736842 0.96296296 0.8762215  0.84615385
 0.88727273 0.94650206 0.90635452 0.88778878]

mean value: 0.9106624340187

key: test_recall
value: [0.87096774 0.87096774 0.83870968 0.67741935 0.93548387 0.96774194
 0.80645161 0.83870968 0.96666667 0.93548387]

mean value: 0.8708602150537634

key: train_recall
value: [0.87769784 0.91366906 0.84172662 0.74820144 0.9676259  0.94964029
 0.87769784 0.82733813 0.97132616 0.9676259 ]

mean value: 0.8942549186457286

key: test_roc_auc
value: [0.87096774 0.90322581 0.90322581 0.75806452 0.87096774 0.82258065
 0.83870968 0.90322581 0.85430108 0.85107527]

mean value: 0.8576344086021506

key: train_roc_auc
value: [0.9118705  0.90827338 0.89748201 0.85971223 0.91546763 0.88848921
 0.88309353 0.89028777 0.93530337 0.92288105]

mean value: 0.9012860679198577

key: test_jcc
value: [0.77142857 0.81818182 0.8125     0.58333333 0.78378378 0.73170732
 0.71428571 0.8125     0.76315789 0.76315789]

mean value: 0.7554036327560076

key: train_jcc
value: [0.83276451 0.83278689 0.80412371 0.72727273 0.85126582 0.80981595
 0.78964401 0.79037801 0.88273616 0.86217949]

mean value: 0.8182967266032459

MCC on Blind test: 0.15

Accuracy on Blind test: 0.77

Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifier(random_state=42))])

key: fit_time
value: [0.01394916 0.01445484 0.01928425 0.01152682 0.01178312 0.01156855
 0.01153874 0.0114634  0.01149082 0.01142311]

mean value: 0.012848281860351562

key: score_time
value: [0.01326442 0.01076293 0.01061916 0.01054502 0.01053524 0.01052785
 0.01054406 0.01065135 0.01066136 0.01065755]

mean value: 0.010876893997192383

key: test_mcc
value: [0.90369611 1.         0.90369611 0.87831007 0.80813523 0.93743687
 0.90369611 1.         0.77072165 0.90586325]

mean value: 0.9011555410657976

key: train_mcc
value: [0.91741458 0.92145965 0.92518498 0.92475364 0.93914669 0.91054923
 0.93214329 0.92475364 0.93206857 0.92840473]

mean value: 0.9255878992579923

key: test_accuracy
value: [0.9516129  1.         0.9516129  0.93548387 0.90322581 0.96774194
 0.9516129  1.         0.8852459  0.95081967]

mean value: 0.9497355896351137

key: train_accuracy
value: [0.95863309 0.96043165 0.96223022 0.96223022 0.96942446 0.95503597
 0.96582734 0.96223022 0.96588869 0.96409336]

mean value: 0.9626025212146261

key: test_fscore
value: [0.95238095 1.         0.95238095 0.93103448 0.90625    0.96875
 0.95238095 1.         0.88135593 0.95384615]

mean value: 0.9498379425951021

key: train_fscore
value: [0.95900178 0.96113074 0.96296296 0.96269982 0.96980462 0.95575221
 0.96637168 0.96269982 0.96637168 0.96441281]

mean value: 0.9631208137030208

key: test_precision
value: [0.9375     1.         0.9375     1.         0.87878788 0.93939394
 0.9375     1.         0.89655172 0.91176471]

mean value: 0.9438998248202102

key: train_precision
value: [0.95053004 0.94444444 0.94463668 0.95087719 0.95789474 0.94076655
 0.95121951 0.95087719 0.95454545 0.95422535]

mean value: 0.9500017150163743

key: test_recall
value: [0.96774194 1.         0.96774194 0.87096774 0.93548387 1.
 0.96774194 1.         0.86666667 1.        ]

mean value: 0.9576344086021505

key: train_recall
value: [0.9676259  0.97841727 0.98201439 0.97482014 0.98201439 0.97122302
 0.98201439 0.97482014 0.97849462 0.97482014]

mean value: 0.9766264407828575

key: test_roc_auc
value: [0.9516129  1.         0.9516129  0.93548387 0.90322581 0.96774194
 0.9516129  1.         0.88494624 0.95      ]

mean value: 0.9496236559139786

key: train_roc_auc
value: [0.95863309 0.96043165 0.96223022 0.96223022 0.96942446 0.95503597
 0.96582734 0.96223022 0.96586602 0.96411258]

mean value: 0.9626021763234573

key: test_jcc
value: [0.90909091 1.         0.90909091 0.87096774 0.82857143 0.93939394
 0.90909091 1.         0.78787879 0.91176471]

mean value: 0.906584933093472

key: train_jcc
value: [0.92123288 0.92517007 0.92857143 0.92808219 0.94137931 0.91525424
 0.93493151 0.92808219 0.93493151 0.93127148]

mean value: 0.9288906795867435

MCC on Blind test: 0.19

Accuracy on Blind test: 0.44

Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
                       n_estimators=1000, n_jobs=10, oob_score=True,
                       random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', use_label_encoder=False,
              validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('num', MinMaxScaler(),
                                                  Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
       'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
       'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values',
       'electro_rr', 'electro_mm', 'electro_sm', 'electr...
       'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
       'logorI', 'lineage_proportion', 'dist_lineage_proportion',
       'lineage_count_all', 'lineage_count_unique'],
      dtype='object')),
                                                 ('cat', OneHotEncoder(),
                                                  Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
       'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
      dtype='object'))])),
                ('model', RidgeClassifierCV(cv=10))])

key: fit_time
value: [0.11430693 0.1973083  0.16238761 0.19749618 0.09576607 0.1363256
 0.09693003 0.12462544 0.20535517 0.23405504]

mean value: 0.1564556360244751

key: score_time
value: [0.01904321 0.02068663 0.02037406 0.02086973 0.01105285 0.01114559
 0.01939201 0.01616836 0.01520658 0.01612258]

mean value: 0.01700615882873535

key: test_mcc
value: [0.90369611 1.         0.93548387 0.87831007 0.84266484 0.93743687
 0.93743687 0.96824584 0.77072165 0.93635873]

mean value: 0.9110354846088805

key: train_mcc
value: [0.92844206 0.93563929 0.9393413  0.93563929 0.94986154 0.9393413
 0.93214329 0.92844206 0.94264494 0.93558747]

mean value: 0.9367082543906752

key: test_accuracy
value: [0.9516129  1.         0.96774194 0.93548387 0.91935484 0.96774194
 0.96774194 0.98387097 0.8852459  0.96721311]

mean value: 0.9546007403490216

key: train_accuracy
value: [0.96402878 0.9676259  0.96942446 0.9676259  0.97482014 0.96942446
 0.96582734 0.96402878 0.97127469 0.96768402]

mean value: 0.9681764462756545

key: test_fscore
value: [0.95238095 1.         0.96774194 0.93103448 0.92307692 0.96875
 0.96875    0.98360656 0.88135593 0.96875   ]

mean value: 0.9545446783280807

key: train_fscore
value: [0.96453901 0.96808511 0.9699115  0.96808511 0.97508897 0.9699115
 0.96637168 0.96453901 0.97153025 0.96797153]

mean value: 0.9686033664546803

key: test_precision
value: [0.9375     1.         0.96774194 1.         0.88235294 0.93939394
 0.93939394 1.         0.89655172 0.93939394]

mean value: 0.950232841898009

key: train_precision
value: [0.95104895 0.95454545 0.95470383 0.95454545 0.96478873 0.95470383
 0.95121951 0.95104895 0.96466431 0.95774648]

mean value: 0.9559015511110829

key: test_recall
value: [0.96774194 1.         0.96774194 0.87096774 0.96774194 1.
 1.         0.96774194 0.86666667 1.        ]

mean value: 0.9608602150537635

key: train_recall
value: [0.97841727 0.98201439 0.98561151 0.98201439 0.98561151 0.98561151
 0.98201439 0.97841727 0.97849462 0.97841727]

mean value: 0.9816624120058792

key: test_roc_auc
value: [0.9516129  1.         0.96774194 0.93548387 0.91935484 0.96774194
 0.96774194 0.98387097 0.88494624 0.96666667]

mean value: 0.9545161290322581

key: train_roc_auc
value: [0.96402878 0.9676259  0.96942446 0.9676259  0.97482014 0.96942446
 0.96582734 0.96402878 0.9712617  0.96770326]

mean value: 0.9681770712462289

key: test_jcc
value: [0.90909091 1.         0.9375     0.87096774 0.85714286 0.93939394
 0.93939394 0.96774194 0.78787879 0.93939394]

mean value: 0.9148504049713727

key: train_jcc
value: [0.93150685 0.93814433 0.94158076 0.93814433 0.95138889 0.94158076
 0.93493151 0.93150685 0.94463668 0.93793103]

mean value: 0.9391351978873097

MCC on Blind test: 0.15

Accuracy on Blind test: 0.38