LSHTM_analysis/scripts/ml/log_pnca_config.txt
2022-06-20 21:55:47 +01:00

19198 lines
940 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 424
PASS: my_features_df and aa_df successfully combined
nrows: 424
ncols: 265
count of NULL values before imputation
or_mychisq 102
log10_or_mychisq 102
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
No. of numerical features: 43
No. of categorical features: 7
index: 0
ind: 1
Mask count check: True
Original Data
Counter({1: 114, 0: 71}) Data dim: (185, 50)
-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (185, 50)
Test data size: (239, 50)
y_train numbers: Counter({1: 114, 0: 71})
y_train ratio: 0.6228070175438597
y_test_numbers: Counter({0: 120, 1: 119})
y_test ratio: 1.0084033613445378
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 114, 1: 114})
(228, 50)
Simple Random UnderSampling
Counter({0: 71, 1: 71})
(142, 50)
Simple Combined Over and UnderSampling
Counter({0: 114, 1: 114})
(228, 50)
SMOTE_NC OverSampling
Counter({0: 114, 1: 114})
(228, 50)
#####################################################################
Running ML analysis: UQ [without AA index but with active site annotations]
Gene name: pncA
Drug name: pyrazinamide
Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/uq_v1/
Sanity checks:
Total input features: 50
Training data size: (185, 50)
Test data size: (239, 50)
Target feature numbers (training data): Counter({1: 114, 0: 71})
Target features ratio (training data: 0.6228070175438597
Target feature numbers (test data): Counter({0: 120, 1: 119})
Target features ratio (test data): 1.0084033613445378
#####################################################################
================================================================
Strucutral features (n): 34
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.01643085 0.01587701 0.01659703 0.01716876 0.0157814 0.01725698
0.0168314 0.01593757 0.01981902 0.01665521]
mean value: 0.016835522651672364
key: score_time
value: [0.01110053 0.01039171 0.01039815 0.01039696 0.01042223 0.01038051
0.01035452 0.01037598 0.01076961 0.01039314]
mean value: 0.010498332977294921
key: test_mcc
value: [0.33796318 0.58655573 0.28690229 0.67460105 0.6761234 0.64465837
1. 0.12182898 0.67005939 0.52299758]
mean value: 0.5521689989382099
key: train_mcc
value: [0.78194719 0.69251873 0.70439866 0.69166175 0.69166175 0.72007099
0.73268764 0.74454326 0.77164805 0.75735135]
mean value: 0.7288489368704532
key: test_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.68421053 0.78947368 0.68421053 0.84210526 0.84210526 0.83333333
1. 0.61111111 0.83333333 0.77777778]
mean value: 0.789766081871345
key: train_accuracy
value: [0.89759036 0.85542169 0.86144578 0.85542169 0.85542169 0.86826347
0.8742515 0.88023952 0.89221557 0.88622754]
mean value: 0.8726498809609696
key: test_fscore
value: [0.75 0.81818182 0.76923077 0.86956522 0.88888889 0.86956522
1. 0.72 0.88 0.83333333]
mean value: 0.8398765244417419
key: train_fscore
value: [0.9178744 0.88888889 0.89099526 0.88785047 0.88785047 0.89908257
0.90322581 0.90654206 0.91666667 0.91079812]
mean value: 0.9009774700333214
key: test_precision
value: [0.69230769 0.9 0.71428571 0.90909091 0.8 0.83333333
1. 0.64285714 0.78571429 0.76923077]
mean value: 0.8046819846819847
key: train_precision
value: [0.91346154 0.84210526 0.86238532 0.84821429 0.84821429 0.85217391
0.85964912 0.87387387 0.87610619 0.88181818]
mean value: 0.8658001980381739
key: test_recall
value: [0.81818182 0.75 0.83333333 0.83333333 1. 0.90909091
1. 0.81818182 1. 0.90909091]
mean value: 0.8871212121212121
key: train_recall
value: [0.9223301 0.94117647 0.92156863 0.93137255 0.93137255 0.95145631
0.95145631 0.94174757 0.96116505 0.94174757]
mean value: 0.9395393108699791
key: test_roc_auc
value: [0.65909091 0.80357143 0.63095238 0.8452381 0.78571429 0.81168831
1. 0.55194805 0.78571429 0.74025974]
mean value: 0.7614177489177489
key: train_roc_auc
value: [0.88973648 0.82996324 0.84359681 0.83287377 0.83287377 0.84291566
0.85072816 0.86149879 0.87120752 0.86931129]
mean value: 0.8524705482921324
key: test_jcc
value: [0.6 0.69230769 0.625 0.76923077 0.8 0.76923077
1. 0.5625 0.78571429 0.71428571]
mean value: 0.7318269230769231
key: train_jcc
value: [0.84821429 0.8 0.8034188 0.79831933 0.79831933 0.81666667
0.82352941 0.82905983 0.84615385 0.8362069 ]
mean value: 0.8199888394792046
MCC on Blind test: 0.32
Accuracy on Blind test: 0.64
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.41101408 0.43322134 0.43520188 0.423666 0.40705562 0.42979956
0.44227242 0.42043042 0.44399595 0.42415285]
mean value: 0.4270810127258301
key: score_time
value: [0.01099348 0.01108956 0.01145554 0.01103401 0.01119256 0.02155566
0.01103187 0.01108336 0.0113132 0.01128364]
mean value: 0.012203288078308106
key: test_mcc
value: [0.45868247 0.54761905 0.88949918 0.80507649 1. 0.76623377
0.71350607 0.52299758 0.67005939 0.4025974 ]
mean value: 0.6776271401537742
key: train_mcc
value: [0.93615116 0.87323164 0.8982762 0.88572497 0.91158328 1.
0.87286094 0.89863369 0.94933931 0.98737524]
mean value: 0.9213176411447679
key: test_accuracy
value: [0.73684211 0.78947368 0.94736842 0.89473684 1. 0.88888889
0.83333333 0.77777778 0.83333333 0.66666667]
mean value: 0.8368421052631578
key: train_accuracy
value: [0.96987952 0.93975904 0.95180723 0.94578313 0.95783133 1.
0.94011976 0.95209581 0.9760479 0.99401198]
mean value: 0.9627335690065651
key: test_fscore
value: [0.8 0.83333333 0.96 0.90909091 1. 0.90909091
0.84210526 0.83333333 0.88 0.66666667]
mean value: 0.8633620414673047
key: train_fscore
value: [0.97607656 0.95238095 0.96153846 0.9569378 0.96650718 1.
0.95192308 0.96190476 0.98076923 0.99516908]
mean value: 0.9703207096742565
key: test_precision
value: [0.71428571 0.83333333 0.92307692 1. 1. 0.90909091
1. 0.76923077 0.78571429 0.85714286]
mean value: 0.8791874791874792
key: train_precision
value: [0.96226415 0.92592593 0.94339623 0.93457944 0.94392523 1.
0.94285714 0.94392523 0.97142857 0.99038462]
mean value: 0.9558686539496802
key: test_recall
value: [0.90909091 0.83333333 1. 0.83333333 1. 0.90909091
0.72727273 0.90909091 1. 0.54545455]
mean value: 0.8666666666666667
key: train_recall
value: [0.99029126 0.98039216 0.98039216 0.98039216 0.99019608 1.
0.96116505 0.98058252 0.99029126 1. ]
mean value: 0.9853702646106987
key: test_roc_auc
value: [0.70454545 0.77380952 0.92857143 0.91666667 1. 0.88311688
0.86363636 0.74025974 0.78571429 0.7012987 ]
mean value: 0.8297619047619048
key: train_roc_auc
value: [0.9633996 0.92769608 0.94332108 0.93550858 0.94822304 1.
0.93370752 0.94341626 0.97170813 0.9921875 ]
mean value: 0.9559167791307461
key: test_jcc
value: [0.66666667 0.71428571 0.92307692 0.83333333 1. 0.83333333
0.72727273 0.71428571 0.78571429 0.5 ]
mean value: 0.7697968697968698
key: train_jcc
value: [0.95327103 0.90909091 0.92592593 0.91743119 0.93518519 1.
0.90825688 0.9266055 0.96226415 0.99038462]
mean value: 0.9428415392549067
MCC on Blind test: 0.2
Accuracy on Blind test: 0.59
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00974631 0.00932074 0.0071764 0.0069828 0.00684166 0.0073626
0.00683832 0.00736499 0.00704551 0.00715041]
mean value: 0.007582974433898926
key: score_time
value: [0.01076746 0.01019335 0.00825691 0.00814319 0.00808716 0.00803375
0.00830436 0.00816035 0.00810671 0.00811577]
mean value: 0.008616900444030762
key: test_mcc
value: [ 0.5077524 0.26772484 -0.12677314 0.40849122 0.09356015 0.39594419
0.44320263 0.0805823 0.0805823 0.56061191]
mean value: 0.2711678789936081
key: train_mcc
value: [0.39956942 0.36799004 0.44276724 0.40782666 0.39882278 0.42873208
0.40887563 0.43322852 0.42873208 0.41898177]
mean value: 0.41355262056408115
key: test_accuracy
value: [0.73684211 0.68421053 0.52631579 0.73684211 0.63157895 0.72222222
0.72222222 0.61111111 0.61111111 0.77777778]
mean value: 0.6760233918128655
key: train_accuracy
value: [0.72289157 0.69277108 0.74096386 0.72289157 0.71686747 0.73053892
0.7245509 0.73652695 0.73053892 0.73053892]
mean value: 0.7249080152947118
key: test_fscore
value: [0.81481481 0.78571429 0.66666667 0.81481481 0.75862069 0.8
0.81481481 0.74074074 0.74074074 0.84615385]
mean value: 0.7783081414115897
key: train_fscore
value: [0.81147541 0.8 0.81702128 0.80991736 0.80816327 0.81632653
0.81147541 0.81666667 0.81632653 0.81327801]
mean value: 0.8120650453135811
key: test_precision
value: [0.6875 0.6875 0.6 0.73333333 0.64705882 0.71428571
0.6875 0.625 0.625 0.73333333]
mean value: 0.6740511204481793
key: train_precision
value: [0.70212766 0.66666667 0.72180451 0.7 0.69230769 0.70422535
0.70212766 0.71532847 0.70422535 0.71014493]
mean value: 0.7018958288316359
key: test_recall
value: [1. 0.91666667 0.75 0.91666667 0.91666667 0.90909091
1. 0.90909091 0.90909091 1. ]
mean value: 0.9227272727272727
key: train_recall
value: [0.96116505 1. 0.94117647 0.96078431 0.97058824 0.97087379
0.96116505 0.95145631 0.97087379 0.95145631]
mean value: 0.9639539310869979
key: test_roc_auc
value: [0.6875 0.60119048 0.44642857 0.67261905 0.5297619 0.66883117
0.64285714 0.52597403 0.52597403 0.71428571]
mean value: 0.6015422077922078
key: train_roc_auc
value: [0.64724919 0.6015625 0.68152574 0.65226716 0.64154412 0.65731189
0.65245752 0.67104066 0.65731189 0.66322816]
mean value: 0.6525498822101656
key: test_jcc
value: [0.6875 0.64705882 0.5 0.6875 0.61111111 0.66666667
0.6875 0.58823529 0.58823529 0.73333333]
mean value: 0.6397140522875817
key: train_jcc
value: [0.68275862 0.66666667 0.69064748 0.68055556 0.67808219 0.68965517
0.68275862 0.69014085 0.68965517 0.68531469]
mean value: 0.6836235012609437
MCC on Blind test: 0.44
Accuracy on Blind test: 0.69
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00754261 0.00742865 0.00720763 0.00705886 0.00744486 0.00740099
0.00711918 0.00755024 0.00717258 0.00742507]
mean value: 0.007335066795349121
key: score_time
value: [0.00887275 0.00809169 0.00855422 0.00822783 0.00839949 0.0080905
0.00788832 0.00815272 0.00828528 0.00825739]
mean value: 0.008282017707824708
key: test_mcc
value: [ 0.21660006 0.32142857 0.23262105 0.28690229 0.28690229 0.43320011
0.16116459 -0.24029619 0.40291148 0.40291148]
mean value: 0.2504345746462975
key: train_mcc
value: [0.34619876 0.33098314 0.29538063 0.35569507 0.35404664 0.3240165
0.35981593 0.37214605 0.27958995 0.33041139]
mean value: 0.3348284059138056
key: test_accuracy
value: [0.63157895 0.68421053 0.63157895 0.68421053 0.68421053 0.72222222
0.61111111 0.44444444 0.72222222 0.72222222]
mean value: 0.6538011695906433
key: train_accuracy
value: [0.69879518 0.69277108 0.6746988 0.70481928 0.69879518 0.68862275
0.70658683 0.71257485 0.67065868 0.68862275]
mean value: 0.6936945386335762
key: test_fscore
value: [0.72 0.75 0.69565217 0.76923077 0.76923077 0.76190476
0.69565217 0.58333333 0.7826087 0.7826087 ]
mean value: 0.7310221372830068
key: train_fscore
value: [0.76635514 0.76497696 0.74766355 0.77625571 0.76190476 0.75925926
0.77625571 0.78181818 0.74885845 0.75471698]
mean value: 0.7638064697242107
key: test_precision
value: [0.64285714 0.75 0.72727273 0.71428571 0.71428571 0.8
0.66666667 0.53846154 0.75 0.75 ]
mean value: 0.7053829503829504
key: train_precision
value: [0.73873874 0.72173913 0.71428571 0.72649573 0.74074074 0.72566372
0.73275862 0.73504274 0.70689655 0.73394495]
mean value: 0.7276306629094831
key: test_recall
value: [0.81818182 0.75 0.66666667 0.83333333 0.83333333 0.72727273
0.72727273 0.63636364 0.81818182 0.81818182]
mean value: 0.7628787878787879
key: train_recall
value: [0.7961165 0.81372549 0.78431373 0.83333333 0.78431373 0.7961165
0.82524272 0.83495146 0.7961165 0.77669903]
mean value: 0.8040928992956405
key: test_roc_auc
value: [0.59659091 0.66071429 0.61904762 0.63095238 0.63095238 0.72077922
0.57792208 0.38961039 0.69480519 0.69480519]
mean value: 0.6216179653679654
key: train_roc_auc
value: [0.66789952 0.65686275 0.64215686 0.66666667 0.67340686 0.65587075
0.67043386 0.67528823 0.63243325 0.66178701]
mean value: 0.6602805766319473
key: test_jcc
value: [0.5625 0.6 0.53333333 0.625 0.625 0.61538462
0.53333333 0.41176471 0.64285714 0.64285714]
mean value: 0.5792030273647921
key: train_jcc
value: [0.62121212 0.61940299 0.59701493 0.63432836 0.61538462 0.6119403
0.63432836 0.64179104 0.59854015 0.60606061]
mean value: 0.6180003458791998
MCC on Blind test: 0.51
Accuracy on Blind test: 0.74
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00691724 0.00921154 0.00725603 0.00683308 0.00641084 0.0067122
0.00755239 0.00700617 0.00747585 0.00672269]
mean value: 0.00720980167388916
key: score_time
value: [0.04755116 0.03781438 0.01461935 0.01340771 0.01276135 0.01304817
0.01399469 0.01276302 0.01416636 0.01354003]
mean value: 0.01936662197113037
key: test_mcc
value: [ 0.33796318 0.14085904 0.32142857 -0.33071891 -0.20865621 0.12182898
-0.02548236 0.2987013 0.12182898 0.53246753]
mean value: 0.13102201054732363
key: train_mcc
value: [0.51724228 0.58603243 0.6140767 0.51866448 0.57255314 0.57404517
0.54744208 0.57404517 0.6296076 0.53388143]
mean value: 0.5667590493666902
key: test_accuracy
value: [0.68421053 0.63157895 0.68421053 0.47368421 0.47368421 0.61111111
0.5 0.66666667 0.61111111 0.77777778]
mean value: 0.6114035087719298
key: train_accuracy
value: [0.77710843 0.80722892 0.81927711 0.77710843 0.80120482 0.80239521
0.79041916 0.80239521 0.82634731 0.78443114]
mean value: 0.7987915734795469
key: test_fscore
value: [0.75 0.74074074 0.75 0.64285714 0.61538462 0.72
0.57142857 0.72727273 0.72 0.81818182]
mean value: 0.7055865615865616
key: train_fscore
value: [0.83842795 0.85321101 0.86363636 0.83257919 0.84651163 0.85067873
0.84304933 0.85067873 0.86995516 0.83486239]
mean value: 0.8483590469525649
key: test_precision
value: [0.69230769 0.66666667 0.75 0.5625 0.57142857 0.64285714
0.6 0.72727273 0.64285714 0.81818182]
mean value: 0.6674071761571762
key: train_precision
value: [0.76190476 0.80172414 0.80508475 0.77310924 0.80530973 0.79661017
0.78333333 0.79661017 0.80833333 0.79130435]
mean value: 0.7923323977285066
key: test_recall
value: [0.81818182 0.83333333 0.75 0.75 0.66666667 0.81818182
0.54545455 0.72727273 0.81818182 0.81818182]
mean value: 0.7545454545454545
key: train_recall
value: [0.93203883 0.91176471 0.93137255 0.90196078 0.89215686 0.91262136
0.91262136 0.91262136 0.94174757 0.88349515]
mean value: 0.9132400533028746
key: test_roc_auc
value: [0.65909091 0.55952381 0.66071429 0.375 0.4047619 0.55194805
0.48701299 0.64935065 0.55194805 0.76623377]
mean value: 0.5665584415584416
key: train_roc_auc
value: [0.72792418 0.77619485 0.78599877 0.74004289 0.77420343 0.76881068
0.75318568 0.76881068 0.79118629 0.75424757]
mean value: 0.7640605028419135
key: test_jcc
value: [0.6 0.58823529 0.6 0.47368421 0.44444444 0.5625
0.4 0.57142857 0.5625 0.69230769]
mean value: 0.5495100212824671
key: train_jcc
value: [0.72180451 0.744 0.76 0.71317829 0.73387097 0.74015748
0.72868217 0.74015748 0.76984127 0.71653543]
mean value: 0.7368227607678467
MCC on Blind test: 0.22
Accuracy on Blind test: 0.6
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00955367 0.00924325 0.00820327 0.00793886 0.00868225 0.00840187
0.00826693 0.00931406 0.00927305 0.00903201]
mean value: 0.008790922164916993
key: score_time
value: [0.0091145 0.00844264 0.00794554 0.00818419 0.00846839 0.00918674
0.00848913 0.00863361 0.00857615 0.00812507]
mean value: 0.008516597747802734
key: test_mcc
value: [ 0.34405118 0.14085904 -0.03149704 0.14085904 0.3086067 0.56061191
0.44320263 0.0805823 0.3040345 0.56061191]
mean value: 0.2851922165850045
key: train_mcc
value: [0.65495721 0.59292706 0.63691667 0.64636933 0.56076174 0.57399753
0.57517958 0.70283753 0.55505316 0.64203075]
mean value: 0.6141030557952815
key: test_accuracy
value: [0.68421053 0.63157895 0.57894737 0.63157895 0.68421053 0.77777778
0.72222222 0.61111111 0.66666667 0.77777778]
mean value: 0.6766081871345029
key: train_accuracy
value: [0.8313253 0.79518072 0.8253012 0.8253012 0.78313253 0.79041916
0.79640719 0.85628743 0.78443114 0.82634731]
mean value: 0.8114133179424284
key: test_fscore
value: [0.76923077 0.74074074 0.71428571 0.74074074 0.8 0.84615385
0.81481481 0.74074074 0.78571429 0.84615385]
mean value: 0.7798575498575498
key: train_fscore
value: [0.87931034 0.85714286 0.8722467 0.87445887 0.8487395 0.85355649
0.85470085 0.89380531 0.8487395 0.87445887]
mean value: 0.8657159288311089
key: test_precision
value: [0.66666667 0.66666667 0.625 0.66666667 0.66666667 0.73333333
0.6875 0.625 0.64705882 0.73333333]
mean value: 0.6717892156862745
key: train_precision
value: [0.79069767 0.75 0.792 0.78294574 0.74264706 0.75
0.76335878 0.82113821 0.74814815 0.7890625 ]
mean value: 0.7729998107832459
key: test_recall
value: [0.90909091 0.83333333 0.83333333 0.83333333 1. 1.
1. 0.90909091 1. 1. ]
mean value: 0.9318181818181819
key: train_recall
value: [0.99029126 1. 0.97058824 0.99019608 0.99019608 0.99029126
0.97087379 0.98058252 0.98058252 0.98058252]
mean value: 0.9844184275652008
key: test_roc_auc
value: [0.64204545 0.55952381 0.48809524 0.55952381 0.57142857 0.71428571
0.64285714 0.52597403 0.57142857 0.71428571]
mean value: 0.5989448051948052
key: train_roc_auc
value: [0.78085992 0.734375 0.78216912 0.77634804 0.72166054 0.72952063
0.74324939 0.81841626 0.72466626 0.77935376]
mean value: 0.759061892354029
key: test_jcc
value: [0.625 0.58823529 0.55555556 0.58823529 0.66666667 0.73333333
0.6875 0.58823529 0.64705882 0.73333333]
mean value: 0.6413153594771241
key: train_jcc
value: [0.78461538 0.75 0.7734375 0.77692308 0.73722628 0.74452555
0.74626866 0.808 0.73722628 0.77692308]
mean value: 0.7635145797367737
MCC on Blind test: 0.42
Accuracy on Blind test: 0.67
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.67604995 0.61749172 0.58538461 0.58932328 0.74973798 0.71265984
0.60874438 0.61769533 0.61801505 0.55964708]
mean value: 0.6334749221801758
key: score_time
value: [0.01328945 0.01198721 0.01105618 0.0122695 0.01297355 0.01261806
0.01269841 0.01275897 0.01224279 0.01214409]
mean value: 0.01240382194519043
key: test_mcc
value: [0.45868247 0.28690229 0.67460105 0.45361105 0.88949918 0.64465837
0.71350607 0.12182898 0.2548236 0.2987013 ]
mean value: 0.4796814363849572
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 0.68421053 0.84210526 0.73684211 0.94736842 0.83333333
0.83333333 0.61111111 0.66666667 0.66666667]
mean value: 0.7558479532163742
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.76923077 0.86956522 0.7826087 0.96 0.86956522
0.84210526 0.72 0.76923077 0.72727273]
mean value: 0.8109578659326944
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.71428571 0.71428571 0.90909091 0.81818182 0.92307692 0.83333333
1. 0.64285714 0.66666667 0.72727273]
mean value: 0.7949050949050949
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.83333333 0.83333333 0.75 1. 0.90909091
0.72727273 0.81818182 0.90909091 0.72727273]
mean value: 0.8416666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.70454545 0.63095238 0.8452381 0.73214286 0.92857143 0.81168831
0.86363636 0.55194805 0.5974026 0.64935065]
mean value: 0.7315476190476191
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.625 0.76923077 0.64285714 0.92307692 0.76923077
0.72727273 0.5625 0.625 0.57142857]
mean value: 0.688226356976357
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.3
Accuracy on Blind test: 0.65
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01134348 0.01067019 0.00884771 0.00852466 0.0083487 0.00794363
0.00782943 0.00811911 0.00781536 0.0099113 ]
mean value: 0.008935356140136718
key: score_time
value: [0.01315331 0.00911045 0.00869799 0.00861573 0.00856686 0.00791645
0.00785732 0.00792217 0.00788569 0.00925827]
mean value: 0.008898425102233886
key: test_mcc
value: [0.45361105 0.89559105 1. 0.89559105 0.67460105 0.66254135
0.89188259 0.26856633 0.88640526 0.76623377]
mean value: 0.7395023497912928
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 0.94736842 1. 0.94736842 0.84210526 0.83333333
0.94444444 0.66666667 0.94444444 0.88888889]
mean value: 0.8751461988304093
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 0.95652174 1. 0.95652174 0.86956522 0.85714286
0.95238095 0.75 0.95652174 0.90909091]
mean value: 0.8990353849049502
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 1. 1. 1. 0.90909091 0.9
1. 0.69230769 0.91666667 0.90909091]
mean value: 0.9077156177156177
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 0.91666667 1. 0.91666667 0.83333333 0.81818182
0.90909091 0.81818182 1. 0.90909091]
mean value: 0.8939393939393939
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.72159091 0.95833333 1. 0.95833333 0.8452381 0.83766234
0.95454545 0.62337662 0.92857143 0.88311688]
mean value: 0.8710768398268398
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 0.91666667 1. 0.91666667 0.76923077 0.75
0.90909091 0.6 0.91666667 0.83333333]
mean value: 0.8254512154512155
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.04
Accuracy on Blind test: 0.51
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.08982825 0.08871984 0.08893657 0.08521676 0.08517504 0.09118032
0.09140134 0.09060311 0.08537102 0.08313799]
mean value: 0.08795702457427979
key: score_time
value: [0.01784706 0.01710677 0.01794147 0.01741266 0.0171783 0.01783466
0.01772738 0.01788592 0.01994085 0.01653624]
mean value: 0.017741131782531738
key: test_mcc
value: [0.33796318 0.65477023 0.65477023 0.54761905 0.88949918 0.76623377
0.88640526 0.26856633 0.67005939 0.77742884]
mean value: 0.6453315461934368
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.68421053 0.84210526 0.84210526 0.78947368 0.94736842 0.88888889
0.94444444 0.66666667 0.83333333 0.88888889]
mean value: 0.8327485380116959
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.88 0.88 0.83333333 0.96 0.90909091
0.95652174 0.75 0.88 0.91666667]
mean value: 0.8715612648221344
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.69230769 0.84615385 0.84615385 0.83333333 0.92307692 0.90909091
0.91666667 0.69230769 0.78571429 0.84615385]
mean value: 0.8290959040959041
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 0.91666667 0.91666667 0.83333333 1. 0.90909091
1. 0.81818182 1. 1. ]
mean value: 0.9212121212121213
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.65909091 0.81547619 0.81547619 0.77380952 0.92857143 0.88311688
0.92857143 0.62337662 0.78571429 0.85714286]
mean value: 0.8070346320346321
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.78571429 0.78571429 0.71428571 0.92307692 0.83333333
0.91666667 0.6 0.78571429 0.84615385]
mean value: 0.779065934065934
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.36
Accuracy on Blind test: 0.65
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0070622 0.00688052 0.00693893 0.00693154 0.00710917 0.00692058
0.00734472 0.0074923 0.00760174 0.00686693]
mean value: 0.007114863395690918
key: score_time
value: [0.00797033 0.00801826 0.00802183 0.0084672 0.00798321 0.00860476
0.00797367 0.00884962 0.0084374 0.00873232]
mean value: 0.008305859565734864
key: test_mcc
value: [ 0.4719399 0.20935895 0.32142857 0.01163105 0.0952381 -0.06493506
0.20385888 0.11396058 -0.0805823 0.2548236 ]
mean value: 0.1536722257776471
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 0.57894737 0.68421053 0.52631579 0.57894737 0.44444444
0.61111111 0.55555556 0.5 0.66666667]
mean value: 0.5883040935672514
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76190476 0.6 0.75 0.60869565 0.66666667 0.44444444
0.66666667 0.6 0.60869565 0.76923077]
mean value: 0.6476304613261135
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.75 0.75 0.63636364 0.66666667 0.57142857
0.7 0.66666667 0.58333333 0.66666667]
mean value: 0.6791125541125541
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.72727273 0.5 0.75 0.58333333 0.66666667 0.36363636
0.63636364 0.54545455 0.63636364 0.90909091]
mean value: 0.6318181818181818
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73863636 0.60714286 0.66071429 0.50595238 0.54761905 0.46753247
0.6038961 0.55844156 0.46103896 0.5974026 ]
mean value: 0.5748376623376623
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.61538462 0.42857143 0.6 0.4375 0.5 0.28571429
0.5 0.42857143 0.4375 0.625 ]
mean value: 0.4858241758241758
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.15
Accuracy on Blind test: 0.57
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.08540225 1.07859063 1.06257463 1.06920266 1.05264735 1.05303788
1.1160934 1.0755322 1.05426216 1.04404473]
mean value: 1.0691387891769408
key: score_time
value: [0.09499049 0.09243846 0.09022665 0.09180641 0.08709741 0.08691168
0.08712554 0.08879185 0.0880568 0.08689451]
mean value: 0.08943397998809814
key: test_mcc
value: [0.45868247 1. 1. 0.77380952 0.88949918 0.76623377
1. 0.56061191 0.88640526 0.64465837]
mean value: 0.7979900484560085
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 1. 1. 0.89473684 0.94736842 0.88888889
1. 0.77777778 0.94444444 0.83333333]
mean value: 0.9023391812865497
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 1. 1. 0.91666667 0.96 0.90909091
1. 0.84615385 0.95652174 0.86956522]
mean value: 0.9257998378433161
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.71428571 1. 1. 0.91666667 0.92307692 0.90909091
1. 0.73333333 0.91666667 0.83333333]
mean value: 0.8946453546453547
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 1. 1. 0.91666667 1. 0.90909091
1. 1. 1. 0.90909091]
mean value: 0.9643939393939394
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.70454545 1. 1. 0.88690476 0.92857143 0.88311688
1. 0.71428571 0.92857143 0.81168831]
mean value: 0.8857683982683983
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 1. 1. 0.84615385 0.92307692 0.83333333
1. 0.73333333 0.91666667 0.76923077]
mean value: 0.8688461538461538
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.17
Accuracy on Blind test: 0.56
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.74963999 0.88859892 0.84024143 0.96218228 0.93069863 0.91661644
0.85723209 0.860641 0.84781289 0.82444263]
mean value: 0.9678106307983398
key: score_time
value: [0.22211456 0.18948603 0.20391059 0.21200871 0.21816325 0.22368956
0.13365841 0.19525099 0.19360924 0.23414063]
mean value: 0.20260319709777833
key: test_mcc
value: [0.60553007 0.89559105 0.88949918 0.77380952 0.88949918 0.76623377
1. 0.39594419 0.88640526 0.77742884]
mean value: 0.7879941063317681
key: train_mcc
value: [0.89849587 0.86235326 0.8501742 0.86235326 0.87457979 0.86499607
0.86279135 0.89953068 0.87498674 0.8872319 ]
mean value: 0.8737493106163656
key: test_accuracy
value: [0.78947368 0.94736842 0.94736842 0.89473684 0.94736842 0.88888889
1. 0.72222222 0.94444444 0.88888889]
mean value: 0.8970760233918128
key: train_accuracy
value: [0.95180723 0.93373494 0.92771084 0.93373494 0.93975904 0.93413174
0.93413174 0.95209581 0.94011976 0.94610778]
mean value: 0.9393333814299113
key: test_fscore
value: [0.84615385 0.95652174 0.96 0.91666667 0.96 0.90909091
1. 0.8 0.95652174 0.91666667]
mean value: 0.9221621566838958
key: train_fscore
value: [0.96226415 0.94835681 0.94392523 0.94835681 0.95283019 0.94930876
0.94883721 0.96226415 0.95327103 0.95774648]
mean value: 0.9527160811207689
key: test_precision
value: [0.73333333 1. 0.92307692 0.91666667 0.92307692 0.90909091
1. 0.71428571 0.91666667 0.84615385]
mean value: 0.8882350982350983
key: train_precision
value: [0.93577982 0.90990991 0.90178571 0.90990991 0.91818182 0.90350877
0.91071429 0.93577982 0.91891892 0.92727273]
mean value: 0.9171761689150632
key: test_recall
value: [1. 0.91666667 1. 0.91666667 1. 0.90909091
1. 0.90909091 1. 1. ]
mean value: 0.9651515151515151
key: train_recall
value: [0.99029126 0.99019608 0.99019608 0.99019608 0.99019608 1.
0.99029126 0.99029126 0.99029126 0.99029126]
mean value: 0.9912240624405102
key: test_roc_auc
value: [0.75 0.95833333 0.92857143 0.88690476 0.92857143 0.88311688
1. 0.66883117 0.92857143 0.85714286]
mean value: 0.879004329004329
key: train_roc_auc
value: [0.93959008 0.91697304 0.90916054 0.91697304 0.92478554 0.9140625
0.91702063 0.94045813 0.92483313 0.93264563]
mean value: 0.9236502256646996
key: test_jcc
value: [0.73333333 0.91666667 0.92307692 0.84615385 0.92307692 0.83333333
1. 0.66666667 0.91666667 0.84615385]
mean value: 0.8605128205128205
key: train_jcc
value: [0.92727273 0.90178571 0.89380531 0.90178571 0.90990991 0.90350877
0.90265487 0.92727273 0.91071429 0.91891892]
mean value: 0.9097628946580972
MCC on Blind test: 0.25
Accuracy on Blind test: 0.59
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00777102 0.00735879 0.00705314 0.00707221 0.00719404 0.00715971
0.00785017 0.00709414 0.00755572 0.00760627]
mean value: 0.00737152099609375
key: score_time
value: [0.00856996 0.00846505 0.00813699 0.00842476 0.00811648 0.00864673
0.00846338 0.00875568 0.00872493 0.00881457]
mean value: 0.008511853218078614
key: test_mcc
value: [ 0.21660006 0.32142857 0.23262105 0.28690229 0.28690229 0.43320011
0.16116459 -0.24029619 0.40291148 0.40291148]
mean value: 0.2504345746462975
key: train_mcc
value: [0.34619876 0.33098314 0.29538063 0.35569507 0.35404664 0.3240165
0.35981593 0.37214605 0.27958995 0.33041139]
mean value: 0.3348284059138056
key: test_accuracy
value: [0.63157895 0.68421053 0.63157895 0.68421053 0.68421053 0.72222222
0.61111111 0.44444444 0.72222222 0.72222222]
mean value: 0.6538011695906433
key: train_accuracy
value: [0.69879518 0.69277108 0.6746988 0.70481928 0.69879518 0.68862275
0.70658683 0.71257485 0.67065868 0.68862275]
mean value: 0.6936945386335762
key: test_fscore
value: [0.72 0.75 0.69565217 0.76923077 0.76923077 0.76190476
0.69565217 0.58333333 0.7826087 0.7826087 ]
mean value: 0.7310221372830068
key: train_fscore
value: [0.76635514 0.76497696 0.74766355 0.77625571 0.76190476 0.75925926
0.77625571 0.78181818 0.74885845 0.75471698]
mean value: 0.7638064697242107
key: test_precision
value: [0.64285714 0.75 0.72727273 0.71428571 0.71428571 0.8
0.66666667 0.53846154 0.75 0.75 ]
mean value: 0.7053829503829504
key: train_precision
value: [0.73873874 0.72173913 0.71428571 0.72649573 0.74074074 0.72566372
0.73275862 0.73504274 0.70689655 0.73394495]
mean value: 0.7276306629094831
key: test_recall
value: [0.81818182 0.75 0.66666667 0.83333333 0.83333333 0.72727273
0.72727273 0.63636364 0.81818182 0.81818182]
mean value: 0.7628787878787879
key: train_recall
value: [0.7961165 0.81372549 0.78431373 0.83333333 0.78431373 0.7961165
0.82524272 0.83495146 0.7961165 0.77669903]
mean value: 0.8040928992956405
key: test_roc_auc
value: [0.59659091 0.66071429 0.61904762 0.63095238 0.63095238 0.72077922
0.57792208 0.38961039 0.69480519 0.69480519]
mean value: 0.6216179653679654
key: train_roc_auc
value: [0.66789952 0.65686275 0.64215686 0.66666667 0.67340686 0.65587075
0.67043386 0.67528823 0.63243325 0.66178701]
mean value: 0.6602805766319473
key: test_jcc
value: [0.5625 0.6 0.53333333 0.625 0.625 0.61538462
0.53333333 0.41176471 0.64285714 0.64285714]
mean value: 0.5792030273647921
key: train_jcc
value: [0.62121212 0.61940299 0.59701493 0.63432836 0.61538462 0.6119403
0.63432836 0.64179104 0.59854015 0.60606061]
mean value: 0.6180003458791998
MCC on Blind test: 0.51
Accuracy on Blind test: 0.74
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.07428098 0.04616737 0.04378724 0.04078841 0.05201912 0.16673613
0.0367384 0.03498602 0.03795505 0.03922486]
mean value: 0.0572683572769165
key: score_time
value: [0.0104301 0.0102849 0.01059461 0.01035333 0.01026797 0.01001692
0.00953746 0.00964141 0.00958657 0.00953507]
mean value: 0.010024833679199218
key: test_mcc
value: [0.56729535 0.88949918 0.89559105 1. 0.77380952 0.76623377
1. 0.39594419 0.88640526 0.66254135]
mean value: 0.7837319665122223
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78947368 0.94736842 0.94736842 1. 0.89473684 0.88888889
1. 0.72222222 0.94444444 0.83333333]
mean value: 0.8967836257309941
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83333333 0.96 0.95652174 1. 0.91666667 0.90909091
1. 0.8 0.95652174 0.85714286]
mean value: 0.9189277244494636
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.76923077 0.92307692 1. 1. 0.91666667 0.90909091
1. 0.71428571 0.91666667 0.9 ]
mean value: 0.9049017649017649
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 1. 0.91666667 1. 0.91666667 0.90909091
1. 0.90909091 1. 0.81818182]
mean value: 0.9378787878787879
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.76704545 0.92857143 0.95833333 1. 0.88690476 0.88311688
1. 0.66883117 0.92857143 0.83766234]
mean value: 0.8859036796536797
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.71428571 0.92307692 0.91666667 1. 0.84615385 0.83333333
1. 0.66666667 0.91666667 0.75 ]
mean value: 0.8566849816849816
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.07
Accuracy on Blind test: 0.52
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01267529 0.01247311 0.01756597 0.03184366 0.03098655 0.03128266
0.03108382 0.03088689 0.03076506 0.03127789]
mean value: 0.026084089279174806
key: score_time
value: [0.01049232 0.01059413 0.02069259 0.01077557 0.01968479 0.0106771
0.02043557 0.01927018 0.01058149 0.02066064]
mean value: 0.015386438369750977
key: test_mcc
value: [0.45361105 0.67460105 0.88949918 0.89559105 0.89559105 0.89188259
0.79772404 0.53246753 0.56061191 0.56980288]
mean value: 0.7161382335698945
key: train_mcc
value: [0.92325474 0.82122399 0.84675102 0.83387364 0.84675102 0.84729198
0.87296284 0.86004923 0.89835373 0.86032048]
mean value: 0.8610832667133086
key: test_accuracy
value: [0.73684211 0.84210526 0.94736842 0.94736842 0.94736842 0.94444444
0.88888889 0.77777778 0.77777778 0.77777778]
mean value: 0.8587719298245614
key: train_accuracy
value: [0.96385542 0.91566265 0.92771084 0.92168675 0.92771084 0.92814371
0.94011976 0.93413174 0.95209581 0.93413174]
mean value: 0.9345249260515114
key: test_fscore
value: [0.7826087 0.86956522 0.96 0.95652174 0.95652174 0.95238095
0.9 0.81818182 0.84615385 0.8 ]
mean value: 0.8841934008020964
key: train_fscore
value: [0.97087379 0.93203883 0.94230769 0.93719807 0.94230769 0.94285714
0.95238095 0.94736842 0.96153846 0.9468599 ]
mean value: 0.9475730954818289
key: test_precision
value: [0.75 0.90909091 0.92307692 1. 1. 1.
1. 0.81818182 0.73333333 0.88888889]
mean value: 0.9022571872571873
key: train_precision
value: [0.97087379 0.92307692 0.9245283 0.92380952 0.9245283 0.92523364
0.93457944 0.93396226 0.95238095 0.94230769]
mean value: 0.9355280830019537
key: test_recall
value: [0.81818182 0.83333333 1. 0.91666667 0.91666667 0.90909091
0.81818182 0.81818182 1. 0.72727273]
mean value: 0.8757575757575757
key: train_recall
value: [0.97087379 0.94117647 0.96078431 0.95098039 0.96078431 0.96116505
0.97087379 0.96116505 0.97087379 0.95145631]
mean value: 0.960013325718637
key: test_roc_auc
value: [0.72159091 0.8452381 0.92857143 0.95833333 0.95833333 0.95454545
0.90909091 0.76623377 0.71428571 0.79220779]
mean value: 0.8548430735930737
key: train_roc_auc
value: [0.96162737 0.90808824 0.91789216 0.9129902 0.91789216 0.91808252
0.93074939 0.92589502 0.94637439 0.92885316]
mean value: 0.9268444604783661
key: test_jcc
value: [0.64285714 0.76923077 0.92307692 0.91666667 0.91666667 0.90909091
0.81818182 0.69230769 0.73333333 0.66666667]
mean value: 0.7988078588078588
key: train_jcc
value: [0.94339623 0.87272727 0.89090909 0.88181818 0.89090909 0.89189189
0.90909091 0.9 0.92592593 0.89908257]
mean value: 0.9005751158494797
MCC on Blind test: 0.09
Accuracy on Blind test: 0.54
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.00939846 0.00705338 0.00695395 0.00734782 0.00737906 0.00728893
0.00739002 0.00729275 0.00736165 0.00728703]
mean value: 0.00747530460357666
key: score_time
value: [0.01311707 0.00814056 0.00845981 0.00825524 0.00828815 0.00828743
0.00835443 0.00829411 0.00827003 0.00832057]
mean value: 0.008778738975524902
key: test_mcc
value: [0.60553007 0.32142857 0.14085904 0.28690229 0.26772484 0.52299758
0.44320263 0.0805823 0.0805823 0.56061191]
mean value: 0.33104215300057266
key: train_mcc
value: [0.34161624 0.39993512 0.3929602 0.3794614 0.42213076 0.39858139
0.42337541 0.42542126 0.32037061 0.3808643 ]
mean value: 0.3884716694211574
key: test_accuracy
value: [0.78947368 0.68421053 0.63157895 0.68421053 0.68421053 0.77777778
0.72222222 0.61111111 0.61111111 0.77777778]
mean value: 0.6973684210526316
key: train_accuracy
value: [0.70481928 0.72289157 0.72289157 0.71686747 0.73493976 0.7245509
0.73652695 0.73652695 0.69461078 0.71856287]
mean value: 0.7213188081667989
key: test_fscore
value: [0.84615385 0.75 0.74074074 0.76923077 0.78571429 0.83333333
0.81481481 0.74074074 0.74074074 0.84615385]
mean value: 0.7867623117623117
key: train_fscore
value: [0.79324895 0.80672269 0.79824561 0.79828326 0.8018018 0.80672269
0.80869565 0.81196581 0.78297872 0.79295154]
mean value: 0.8001616730332606
key: test_precision
value: [0.73333333 0.75 0.66666667 0.71428571 0.6875 0.76923077
0.6875 0.625 0.625 0.73333333]
mean value: 0.6991849816849817
key: train_precision
value: [0.70149254 0.70588235 0.72222222 0.70992366 0.74166667 0.71111111
0.73228346 0.72519084 0.6969697 0.72580645]
mean value: 0.7172549007220933
key: test_recall
value: [1. 0.75 0.83333333 0.83333333 0.91666667 0.90909091
1. 0.90909091 0.90909091 1. ]
mean value: 0.906060606060606
key: train_recall
value: [0.91262136 0.94117647 0.89215686 0.91176471 0.87254902 0.93203883
0.90291262 0.9223301 0.89320388 0.87378641]
mean value: 0.9054540262707025
key: test_roc_auc
value: [0.75 0.66071429 0.55952381 0.63095238 0.60119048 0.74025974
0.64285714 0.52597403 0.52597403 0.71428571]
mean value: 0.6351731601731602
key: train_roc_auc
value: [0.63885036 0.65808824 0.67264093 0.65900735 0.69408701 0.66133192
0.68583131 0.67991505 0.63410194 0.6712682 ]
mean value: 0.6655122313893195
key: test_jcc
value: [0.73333333 0.6 0.58823529 0.625 0.64705882 0.71428571
0.6875 0.58823529 0.58823529 0.73333333]
mean value: 0.6505217086834734
key: train_jcc
value: [0.65734266 0.67605634 0.66423358 0.66428571 0.66917293 0.67605634
0.67883212 0.68345324 0.64335664 0.65693431]
mean value: 0.6669723860782252
MCC on Blind test: 0.51
Accuracy on Blind test: 0.73
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00765204 0.00983834 0.00933051 0.01027274 0.00990605 0.00989223
0.01009798 0.01009941 0.01041555 0.01013994]
mean value: 0.009764480590820312
key: score_time
value: [0.00810671 0.00978684 0.00992608 0.0102284 0.01031661 0.01037884
0.01027703 0.01031566 0.01036739 0.01031637]
mean value: 0.01000199317932129
key: test_mcc
value: [0.33796318 0.54761905 0.65477023 0.7824608 0.80507649 0.76623377
0.28203804 0.34188173 0.44320263 0.52299758]
mean value: 0.548424349224469
key: train_mcc
value: [0.88657784 0.85954556 0.72631812 0.76988112 0.84858071 0.83737341
0.56743022 0.76293969 0.77046864 0.79393863]
mean value: 0.7823053934321447
key: test_accuracy
value: [0.68421053 0.78947368 0.84210526 0.89473684 0.89473684 0.88888889
0.5 0.66666667 0.72222222 0.77777778]
mean value: 0.7660818713450293
key: train_accuracy
value: [0.94578313 0.93373494 0.86746988 0.88554217 0.92771084 0.92215569
0.7245509 0.8742515 0.88622754 0.89820359]
mean value: 0.8865630185412308
key: test_fscore
value: [0.75 0.83333333 0.88 0.92307692 0.90909091 0.90909091
0.30769231 0.7 0.81481481 0.83333333]
mean value: 0.7860432530432531
key: train_fscore
value: [0.95566502 0.9468599 0.9009009 0.91479821 0.94059406 0.93596059
0.7125 0.88888889 0.91555556 0.92376682]
mean value: 0.9035489946317999
key: test_precision
value: [0.69230769 0.83333333 0.84615385 0.85714286 1. 0.90909091
1. 0.77777778 0.6875 0.76923077]
mean value: 0.8372537185037185
key: train_precision
value: [0.97 0.93333333 0.83333333 0.84297521 0.95 0.95
1. 0.97674419 0.8442623 0.85833333]
mean value: 0.9158981687740049
key: test_recall
value: [0.81818182 0.83333333 0.91666667 1. 0.83333333 0.90909091
0.18181818 0.63636364 1. 0.90909091]
mean value: 0.8037878787878788
key: train_recall
value: [0.94174757 0.96078431 0.98039216 1. 0.93137255 0.9223301
0.55339806 0.81553398 1. 1. ]
mean value: 0.9105558728345707
key: test_roc_auc
value: [0.65909091 0.77380952 0.81547619 0.85714286 0.91666667 0.88311688
0.59090909 0.67532468 0.64285714 0.74025974]
mean value: 0.755465367965368
key: train_roc_auc
value: [0.94706426 0.92570466 0.83394608 0.8515625 0.92662377 0.92210255
0.77669903 0.89214199 0.8515625 0.8671875 ]
mean value: 0.879459484036333
key: test_jcc
value: [0.6 0.71428571 0.78571429 0.85714286 0.83333333 0.83333333
0.18181818 0.53846154 0.6875 0.71428571]
mean value: 0.6745874958374959
key: train_jcc
value: [0.91509434 0.89908257 0.81967213 0.84297521 0.88785047 0.87962963
0.55339806 0.8 0.8442623 0.85833333]
mean value: 0.830029802977617
MCC on Blind test: 0.13
Accuracy on Blind test: 0.55
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01018405 0.01003146 0.00991631 0.01017022 0.01003242 0.01013088
0.00956511 0.00980854 0.01098204 0.01009631]
mean value: 0.010091733932495118
key: score_time
value: [0.01030588 0.01020026 0.0102222 0.01032352 0.01022243 0.01029515
0.01032782 0.01033735 0.01036525 0.01023245]
mean value: 0.010283231735229492
key: test_mcc
value: [0.5077524 0.51887452 0.72456884 0.80507649 0.6761234 0.66254135
0.1934765 0.44320263 0.67005939 0.43320011]
mean value: 0.5634875632497535
key: train_mcc
value: [0.73618348 0.59399514 0.70269787 0.86061598 0.60495638 0.88573143
0.29075534 0.82931725 0.66982421 0.82396818]
mean value: 0.6998045257344441
key: test_accuracy
value: [0.73684211 0.68421053 0.84210526 0.89473684 0.84210526 0.83333333
0.44444444 0.72222222 0.83333333 0.72222222]
mean value: 0.7555555555555555
key: train_accuracy
value: [0.87349398 0.75301205 0.84337349 0.93373494 0.80120482 0.94610778
0.50299401 0.91616766 0.80838323 0.91616766]
mean value: 0.8294639636389871
key: test_fscore
value: [0.81481481 0.66666667 0.85714286 0.90909091 0.88888889 0.85714286
0.16666667 0.81481481 0.88 0.76190476]
mean value: 0.7617133237133237
key: train_fscore
value: [0.9058296 0.75151515 0.86021505 0.94581281 0.86075949 0.95652174
0.32520325 0.93636364 0.81818182 0.93137255]
mean value: 0.8291775097971825
key: test_precision
value: [0.6875 1. 1. 1. 0.8 0.9
1. 0.6875 0.78571429 0.8 ]
mean value: 0.8660714285714286
key: train_precision
value: [0.84166667 0.98412698 0.95238095 0.95049505 0.75555556 0.95192308
1. 0.88034188 0.98630137 0.94059406]
mean value: 0.924338559476902
key: test_recall
value: [1. 0.5 0.75 0.83333333 1. 0.81818182
0.09090909 1. 1. 0.72727273]
mean value: 0.771969696969697
key: train_recall
value: [0.98058252 0.60784314 0.78431373 0.94117647 1. 0.96116505
0.19417476 1. 0.69902913 0.9223301 ]
mean value: 0.8090614886731392
key: test_roc_auc
value: [0.6875 0.75 0.875 0.91666667 0.78571429 0.83766234
0.54545455 0.64285714 0.78571429 0.72077922]
mean value: 0.7547348484848485
key: train_roc_auc
value: [0.83949761 0.79610907 0.86090686 0.93152574 0.7421875 0.94152002
0.59708738 0.890625 0.84170206 0.91429005]
mean value: 0.8355451292572045
key: test_jcc
value: [0.6875 0.5 0.75 0.83333333 0.8 0.75
0.09090909 0.6875 0.78571429 0.61538462]
mean value: 0.6500341325341326
key: train_jcc
value: [0.82786885 0.60194175 0.75471698 0.89719626 0.75555556 0.91666667
0.19417476 0.88034188 0.69230769 0.87155963]
mean value: 0.7392330028027021
MCC on Blind test: 0.22
Accuracy on Blind test: 0.57
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.08395481 0.0697968 0.07071424 0.07110596 0.07078338 0.07142019
0.07246375 0.07124639 0.07080793 0.07235765]
mean value: 0.07246510982513428
key: score_time
value: [0.0151608 0.01497865 0.01519632 0.01489806 0.01543808 0.01554489
0.01553178 0.01531959 0.01536942 0.01543546]
mean value: 0.015287303924560547
key: test_mcc
value: [0.60553007 0.54761905 1. 0.89559105 0.67460105 0.76623377
0.79772404 0.52299758 0.88640526 0.48416483]
mean value: 0.7180866701858623
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78947368 0.78947368 1. 0.94736842 0.84210526 0.88888889
0.88888889 0.77777778 0.94444444 0.72222222]
mean value: 0.8590643274853801
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84615385 0.83333333 1. 0.95652174 0.86956522 0.90909091
0.9 0.83333333 0.95652174 0.73684211]
mean value: 0.8841362222826754
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.73333333 0.83333333 1. 1. 0.90909091 0.90909091
1. 0.76923077 0.91666667 0.875 ]
mean value: 0.8945745920745921
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.83333333 1. 0.91666667 0.83333333 0.90909091
0.81818182 0.90909091 1. 0.63636364]
mean value: 0.8856060606060606
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.77380952 1. 0.95833333 0.8452381 0.88311688
0.90909091 0.74025974 0.92857143 0.74675325]
mean value: 0.8535173160173161
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.73333333 0.71428571 1. 0.91666667 0.76923077 0.83333333
0.81818182 0.71428571 0.91666667 0.58333333]
mean value: 0.799931734931735
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.0
Accuracy on Blind test: 0.5
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.02634501 0.02745032 0.03282142 0.02707553 0.03081536 0.03036213
0.03100204 0.03243303 0.0233736 0.02885413]
mean value: 0.029053258895874023
key: score_time
value: [0.01942372 0.01645255 0.02343774 0.01584172 0.02157259 0.01987481
0.02784896 0.02355909 0.01557946 0.0171895 ]
mean value: 0.020078015327453614
key: test_mcc
value: [0.56729535 0.89559105 0.89559105 1. 0.67460105 0.76623377
0.79772404 0.56061191 0.88640526 0.64465837]
mean value: 0.768871184699948
key: train_mcc
value: [1. 0.97457108 0.97457108 1. 0.98740179 0.98737524
0.96301704 0.97466626 0.98744925 0.94933931]
mean value: 0.9798391039351805
key: test_accuracy
value: [0.78947368 0.94736842 0.94736842 1. 0.84210526 0.88888889
0.88888889 0.77777778 0.94444444 0.83333333]
mean value: 0.8859649122807017
key: train_accuracy
value: [1. 0.98795181 0.98795181 1. 0.9939759 0.99401198
0.98203593 0.98802395 0.99401198 0.9760479 ]
mean value: 0.9904011254599235
key: test_fscore
value: [0.83333333 0.95652174 0.95652174 1. 0.86956522 0.90909091
0.9 0.84615385 0.95652174 0.86956522]
mean value: 0.9097273740752001
key: train_fscore
value: [1. 0.99019608 0.99019608 1. 0.99507389 0.99516908
0.98522167 0.99029126 0.99512195 0.98076923]
mean value: 0.9922039249615477
key: test_precision
value: [0.76923077 1. 1. 1. 0.90909091 0.90909091
1. 0.73333333 0.91666667 0.83333333]
mean value: 0.9070745920745921
key: train_precision
value: [1. 0.99019608 0.99019608 1. 1. 0.99038462
1. 0.99029126 1. 0.97142857]
mean value: 0.9932496605811855
key: test_recall
value: [0.90909091 0.91666667 0.91666667 1. 0.83333333 0.90909091
0.81818182 1. 1. 0.90909091]
mean value: 0.9212121212121211
key: train_recall
value: [1. 0.99019608 0.99019608 1. 0.99019608 1.
0.97087379 0.99029126 0.99029126 0.99029126]
mean value: 0.9912335808109651
key: test_roc_auc
value: [0.76704545 0.95833333 0.95833333 1. 0.8452381 0.88311688
0.90909091 0.71428571 0.92857143 0.81168831]
mean value: 0.8775703463203464
key: train_roc_auc
value: [1. 0.98728554 0.98728554 1. 0.99509804 0.9921875
0.98543689 0.98733313 0.99514563 0.97170813]
mean value: 0.9901480404054825
key: test_jcc
value: [0.71428571 0.91666667 0.91666667 1. 0.76923077 0.83333333
0.81818182 0.73333333 0.91666667 0.76923077]
mean value: 0.8387595737595738
key: train_jcc
value: [1. 0.98058252 0.98058252 1. 0.99019608 0.99038462
0.97087379 0.98076923 0.99029126 0.96226415]
mean value: 0.9845944172615994
MCC on Blind test: 0.1
Accuracy on Blind test: 0.54
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.01885796 0.01930904 0.02562237 0.02129722 0.06075287 0.03211212
0.04384661 0.03204489 0.0758667 0.05150294]
mean value: 0.038121271133422854
key: score_time
value: [0.01133871 0.01133037 0.0113616 0.02015972 0.02054238 0.01122904
0.011343 0.01591635 0.02104354 0.01130295]
mean value: 0.01455676555633545
key: test_mcc
value: [ 0.40219983 0.26772484 0.28690229 0.18531233 0.44908871 0.2548236
0.39594419 -0.05096472 0.3040345 0.67005939]
mean value: 0.31651249570546197
key: train_mcc
value: [0.88606149 0.90075726 0.87457979 0.88685769 0.92515014 0.91320801
0.89953068 0.91188694 0.87498674 0.94997541]
mean value: 0.9022994142722148
key: test_accuracy
value: [0.68421053 0.68421053 0.68421053 0.63157895 0.73684211 0.66666667
0.72222222 0.55555556 0.66666667 0.83333333]
mean value: 0.6865497076023391
key: train_accuracy
value: [0.94578313 0.95180723 0.93975904 0.94578313 0.96385542 0.95808383
0.95209581 0.95808383 0.94011976 0.9760479 ]
mean value: 0.953141908953178
key: test_fscore
value: [0.78571429 0.78571429 0.76923077 0.72 0.82758621 0.76923077
0.8 0.69230769 0.78571429 0.88 ]
mean value: 0.7815498294808639
key: train_fscore
value: [0.95774648 0.96226415 0.95283019 0.95734597 0.97142857 0.96713615
0.96226415 0.96682464 0.95327103 0.98095238]
mean value: 0.9632063716206098
key: test_precision
value: [0.64705882 0.6875 0.71428571 0.69230769 0.70588235 0.66666667
0.71428571 0.6 0.64705882 0.78571429]
mean value: 0.6860760073260074
key: train_precision
value: [0.92727273 0.92727273 0.91818182 0.9266055 0.94444444 0.93636364
0.93577982 0.94444444 0.91891892 0.96261682]
mean value: 0.9341900860429541
key: test_recall
value: [1. 0.91666667 0.83333333 0.75 1. 0.90909091
0.90909091 0.81818182 1. 1. ]
mean value: 0.9136363636363636
key: train_recall
value: [0.99029126 1. 0.99019608 0.99019608 1. 1.
0.99029126 0.99029126 0.99029126 1. ]
mean value: 0.9941557205406435
key: test_roc_auc
value: [0.625 0.60119048 0.63095238 0.58928571 0.64285714 0.5974026
0.66883117 0.48051948 0.57142857 0.78571429]
mean value: 0.6193181818181818
key: train_roc_auc
value: [0.93165357 0.9375 0.92478554 0.93259804 0.953125 0.9453125
0.94045813 0.94827063 0.92483313 0.96875 ]
mean value: 0.9407286539211154
key: test_jcc
value: [0.64705882 0.64705882 0.625 0.5625 0.70588235 0.625
0.66666667 0.52941176 0.64705882 0.78571429]
mean value: 0.6441351540616247
key: train_jcc
value: [0.91891892 0.92727273 0.90990991 0.91818182 0.94444444 0.93636364
0.92727273 0.93577982 0.91071429 0.96261682]
mean value: 0.9291475107022136
MCC on Blind test: 0.33
Accuracy on Blind test: 0.64
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.12610126 0.12060261 0.11894894 0.11790848 0.12793779 0.12083268
0.12258196 0.12219334 0.12002635 0.11419153]
mean value: 0.121132493019104
key: score_time
value: [0.00943565 0.00874519 0.00873065 0.00975442 0.00927114 0.00965858
0.00981712 0.01030612 0.00883412 0.00869703]
mean value: 0.009325003623962403
key: test_mcc
value: [0.45361105 1. 0.80507649 1. 0.88949918 0.76623377
1. 0.56061191 0.88640526 0.64465837]
mean value: 0.8006096026925367
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73684211 1. 0.89473684 1. 0.94736842 0.88888889
1. 0.77777778 0.94444444 0.83333333]
mean value: 0.9023391812865497
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7826087 1. 0.90909091 1. 0.96 0.90909091
1. 0.84615385 0.95652174 0.86956522]
mean value: 0.9233031316509577
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 1. 1. 1. 0.92307692 0.90909091
1. 0.73333333 0.91666667 0.83333333]
mean value: 0.9065501165501165
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 1. 0.83333333 1. 1. 0.90909091
1. 1. 1. 0.90909091]
mean value: 0.946969696969697
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.72159091 1. 0.91666667 1. 0.92857143 0.88311688
1. 0.71428571 0.92857143 0.81168831]
mean value: 0.8904491341991343
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.64285714 1. 0.83333333 1. 0.92307692 0.83333333
1. 0.73333333 0.91666667 0.76923077]
mean value: 0.8651831501831502
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.14
Accuracy on Blind test: 0.55
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02101755 0.03171253 0.02527022 0.01208258 0.01173353 0.01204896
0.01371312 0.01347113 0.01202703 0.01235557]
mean value: 0.01654322147369385
key: score_time
value: [0.01127648 0.01125073 0.01176476 0.01200485 0.01166868 0.01086617
0.01114273 0.01123142 0.0110383 0.01096892]
mean value: 0.011321306228637695
key: test_mcc
value: [0.4719399 0.40849122 0.09356015 0.44908871 0.56694671 0.26856633
0.44320263 0.0805823 0.0805823 0.66254135]
mean value: 0.3525501597195837
key: train_mcc
value: [0.6002326 0.50998847 0.67610805 0.54823412 0.49142346 0.64107028
0.54903745 0.61519707 0.55309666 0.78305013]
mean value: 0.5967438289020631
key: test_accuracy
value: [0.73684211 0.73684211 0.63157895 0.73684211 0.78947368 0.66666667
0.72222222 0.61111111 0.61111111 0.83333333]
mean value: 0.7076023391812866
key: train_accuracy
value: [0.81325301 0.75903614 0.8373494 0.77710843 0.75903614 0.83233533
0.77844311 0.82035928 0.79041916 0.89820359]
mean value: 0.8065543611572037
key: test_fscore
value: [0.76190476 0.81481481 0.75862069 0.82758621 0.85714286 0.75
0.81481481 0.74074074 0.74074074 0.85714286]
mean value: 0.7923508483853311
key: train_fscore
value: [0.85167464 0.83471074 0.88311688 0.84518828 0.83050847 0.87272727
0.84647303 0.86486486 0.83253589 0.91943128]
mean value: 0.858123135858806
key: test_precision
value: [0.8 0.73333333 0.64705882 0.70588235 0.75 0.69230769
0.6875 0.625 0.625 0.9 ]
mean value: 0.7166082202111614
key: train_precision
value: [0.83962264 0.72142857 0.79069767 0.73722628 0.73134328 0.82051282
0.73913043 0.80672269 0.82075472 0.89814815]
mean value: 0.7905587257811302
key: test_recall
value: [0.72727273 0.91666667 0.91666667 1. 1. 0.81818182
1. 0.90909091 0.90909091 0.81818182]
mean value: 0.9015151515151515
key: train_recall
value: [0.86407767 0.99019608 1. 0.99019608 0.96078431 0.93203883
0.99029126 0.93203883 0.84466019 0.94174757]
mean value: 0.9446030839520274
key: test_roc_auc
value: [0.73863636 0.67261905 0.5297619 0.64285714 0.71428571 0.62337662
0.64285714 0.52597403 0.52597403 0.83766234]
mean value: 0.6454004329004329
key: train_roc_auc
value: [0.7971182 0.69041054 0.7890625 0.71384804 0.69914216 0.80195692
0.71389563 0.78633192 0.7738926 0.88493629]
mean value: 0.7650594784839503
key: test_jcc
value: [0.61538462 0.6875 0.61111111 0.70588235 0.75 0.6
0.6875 0.58823529 0.58823529 0.75 ]
mean value: 0.6583848667672197
key: train_jcc
value: [0.74166667 0.71631206 0.79069767 0.73188406 0.71014493 0.77419355
0.73381295 0.76190476 0.71311475 0.85087719]
mean value: 0.7524608590343069
MCC on Blind test: 0.32
Accuracy on Blind test: 0.61
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.0152986 0.01060939 0.01029634 0.01028705 0.01044655 0.01044083
0.01033711 0.01052427 0.01060605 0.01032233]
mean value: 0.010916852951049804
key: score_time
value: [0.01142406 0.01068401 0.01064181 0.01059437 0.01238728 0.01085711
0.01055193 0.0106585 0.01086307 0.01084256]
mean value: 0.010950469970703125
key: test_mcc
value: [0.21660006 0.67460105 0.77380952 0.80507649 0.89559105 0.76623377
0.79772404 0.67005939 0.56061191 0.66254135]
mean value: 0.6822848626279371
key: train_mcc
value: [0.92308458 0.85954556 0.88685769 0.88521749 0.83387364 0.89863369
0.84736815 0.87286094 0.89835373 0.88573143]
mean value: 0.8791526890998981
key: test_accuracy
value: [0.63157895 0.84210526 0.89473684 0.89473684 0.94736842 0.88888889
0.88888889 0.83333333 0.77777778 0.83333333]
mean value: 0.8432748538011696
key: train_accuracy
value: [0.96385542 0.93373494 0.94578313 0.94578313 0.92168675 0.95209581
0.92814371 0.94011976 0.95209581 0.94610778]
mean value: 0.9429406247745473
key: test_fscore
value: [0.72 0.86956522 0.91666667 0.90909091 0.95652174 0.90909091
0.9 0.88 0.84615385 0.85714286]
mean value: 0.8764232144666927
key: train_fscore
value: [0.97115385 0.9468599 0.95734597 0.95652174 0.93719807 0.96190476
0.94230769 0.95192308 0.96153846 0.95652174]
mean value: 0.9543275259667182
key: test_precision
value: [0.64285714 0.90909091 0.91666667 1. 1. 0.90909091
1. 0.78571429 0.73333333 0.9 ]
mean value: 0.8796753246753246
key: train_precision
value: [0.96190476 0.93333333 0.9266055 0.94285714 0.92380952 0.94392523
0.93333333 0.94285714 0.95238095 0.95192308]
mean value: 0.9412930005631284
key: test_recall
value: [0.81818182 0.83333333 0.91666667 0.83333333 0.91666667 0.90909091
0.81818182 1. 1. 0.81818182]
mean value: 0.8863636363636364
key: train_recall
value: [0.98058252 0.96078431 0.99019608 0.97058824 0.95098039 0.98058252
0.95145631 0.96116505 0.97087379 0.96116505]
mean value: 0.967837426232629
key: test_roc_auc
value: [0.59659091 0.8452381 0.88690476 0.91666667 0.95833333 0.88311688
0.90909091 0.78571429 0.71428571 0.83766234]
mean value: 0.8333603896103896
key: train_roc_auc
value: [0.95854523 0.92570466 0.93259804 0.93841912 0.9129902 0.94341626
0.92104066 0.93370752 0.94637439 0.94152002]
mean value: 0.9354316099417114
key: test_jcc
value: [0.5625 0.76923077 0.84615385 0.83333333 0.91666667 0.83333333
0.81818182 0.78571429 0.73333333 0.75 ]
mean value: 0.7848447385947386
key: train_jcc
value: [0.94392523 0.89908257 0.91818182 0.91666667 0.88181818 0.9266055
0.89090909 0.90825688 0.92592593 0.91666667]
mean value: 0.9128038537941651
MCC on Blind test: 0.21
Accuracy on Blind test: 0.59
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.0855453 0.08357239 0.08917785 0.08271241 0.08259749 0.08286834
0.08288574 0.08441114 0.08273721 0.08266902]
mean value: 0.08391768932342529
key: score_time
value: [0.01079178 0.01085567 0.01092529 0.01075959 0.01109052 0.01090479
0.0108192 0.01083088 0.01145196 0.01102829]
mean value: 0.010945796966552734
key: test_mcc
value: [0.21660006 0.67460105 0.77380952 0.89559105 0.89559105 0.76623377
0.79772404 0.67005939 0.56061191 0.56980288]
mean value: 0.6820624726317762
key: train_mcc
value: [0.92308458 0.85980258 0.88685769 0.83387364 0.83400835 0.89863369
0.87296284 0.87286094 0.89835373 0.86032048]
mean value: 0.8740758514702531
key: test_accuracy
value: [0.63157895 0.84210526 0.89473684 0.94736842 0.94736842 0.88888889
0.88888889 0.83333333 0.77777778 0.77777778]
mean value: 0.8429824561403508
key: train_accuracy
value: [0.96385542 0.93373494 0.94578313 0.92168675 0.92168675 0.95209581
0.94011976 0.94011976 0.95209581 0.93413174]
mean value: 0.9405309862203304
key: test_fscore
value: [0.72 0.86956522 0.91666667 0.95652174 0.95652174 0.90909091
0.9 0.88 0.84615385 0.8 ]
mean value: 0.8754520117563596
key: train_fscore
value: [0.97115385 0.94634146 0.95734597 0.93719807 0.93779904 0.96190476
0.95238095 0.95192308 0.96153846 0.9468599 ]
mean value: 0.9524445547956408
key: test_precision
value: [0.64285714 0.90909091 0.91666667 1. 1. 0.90909091
1. 0.78571429 0.73333333 0.88888889]
mean value: 0.8785642135642135
key: train_precision
value: [0.96190476 0.94174757 0.9266055 0.92380952 0.91588785 0.94392523
0.93457944 0.94285714 0.95238095 0.94230769]
mean value: 0.9386005674027249
key: test_recall
value: [0.81818182 0.83333333 0.91666667 0.91666667 0.91666667 0.90909091
0.81818182 1. 1. 0.72727273]
mean value: 0.8856060606060606
key: train_recall
value: [0.98058252 0.95098039 0.99019608 0.95098039 0.96078431 0.98058252
0.97087379 0.96116505 0.97087379 0.95145631]
mean value: 0.9668475157053112
key: test_roc_auc
value: [0.59659091 0.8452381 0.88690476 0.95833333 0.95833333 0.88311688
0.90909091 0.78571429 0.71428571 0.79220779]
mean value: 0.8329816017316017
key: train_roc_auc
value: [0.95854523 0.9286152 0.93259804 0.9129902 0.91007966 0.94341626
0.93074939 0.93370752 0.94637439 0.92885316]
mean value: 0.9325929046780524
key: test_jcc
value: [0.5625 0.76923077 0.84615385 0.91666667 0.91666667 0.83333333
0.81818182 0.78571429 0.73333333 0.66666667]
mean value: 0.7848447385947386
key: train_jcc
value: [0.94392523 0.89814815 0.91818182 0.88181818 0.88288288 0.9266055
0.90909091 0.90825688 0.92592593 0.89908257]
mean value: 0.9093918053821166
MCC on Blind test: 0.1
Accuracy on Blind test: 0.54
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02180219 0.02042246 0.01844001 0.01872444 0.01910233 0.01665592
0.0174613 0.01886535 0.02107906 0.0179646 ]
mean value: 0.019051766395568846
key: score_time
value: [0.01086044 0.01106286 0.01095486 0.01095009 0.01091909 0.01115131
0.01068163 0.010638 0.01093936 0.01097393]
mean value: 0.01091315746307373
key: test_mcc
value: [0.58002308 0.48856385 0.41096386 0.56490196 0.74242424 0.74047959
0.82575758 0.91666667 0.83205029 0.63636364]
mean value: 0.6738194748889874
key: train_mcc
value: [0.80500813 0.77565201 0.79548704 0.77563066 0.7469525 0.76601619
0.76597166 0.76597166 0.73817726 0.81557242]
mean value: 0.7750439506475604
key: test_accuracy
value: [0.7826087 0.73913043 0.69565217 0.7826087 0.86956522 0.86956522
0.91304348 0.95652174 0.90909091 0.81818182]
mean value: 0.833596837944664
key: train_accuracy
value: [0.90243902 0.88780488 0.89756098 0.88780488 0.87317073 0.88292683
0.88292683 0.88292683 0.86893204 0.90776699]
mean value: 0.887426000473597
key: test_fscore
value: [0.73684211 0.75 0.72 0.76190476 0.86956522 0.88
0.91666667 0.95652174 0.91666667 0.81818182]
mean value: 0.832634897520481
key: train_fscore
value: [0.90384615 0.88780488 0.89655172 0.88888889 0.875 0.88349515
0.88118812 0.88118812 0.86699507 0.90731707]
mean value: 0.8872275175238942
key: test_precision
value: [0.875 0.69230769 0.64285714 0.8 0.90909091 0.84615385
0.91666667 1. 0.84615385 0.81818182]
mean value: 0.8346411921411921
key: train_precision
value: [0.8952381 0.89215686 0.91 0.88461538 0.85849057 0.875
0.89 0.89 0.88 0.91176471]
mean value: 0.8887265614518667
key: test_recall
value: [0.63636364 0.81818182 0.81818182 0.72727273 0.83333333 0.91666667
0.91666667 0.91666667 1. 0.81818182]
mean value: 0.8401515151515152
key: train_recall
value: [0.91262136 0.88349515 0.88349515 0.89320388 0.89215686 0.89215686
0.87254902 0.87254902 0.85436893 0.90291262]
mean value: 0.8859508852084523
key: test_roc_auc
value: [0.77651515 0.74242424 0.70075758 0.78030303 0.87121212 0.86742424
0.91287879 0.95833333 0.90909091 0.81818182]
mean value: 0.8337121212121211
key: train_roc_auc
value: [0.90238911 0.887826 0.89762993 0.88777841 0.8732629 0.88297164
0.88287645 0.88287645 0.86893204 0.90776699]
mean value: 0.8874309918142015
key: test_jcc
value: [0.58333333 0.6 0.5625 0.61538462 0.76923077 0.78571429
0.84615385 0.91666667 0.84615385 0.69230769]
mean value: 0.7217445054945055
key: train_jcc
value: [0.8245614 0.79824561 0.8125 0.8 0.77777778 0.79130435
0.78761062 0.78761062 0.76521739 0.83035714]
mean value: 0.7975184916247269
MCC on Blind test: 0.35
Accuracy on Blind test: 0.67
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.70506549 0.62887144 0.6596272 0.79791594 0.68545508 0.64333749
0.76110959 0.69955564 0.68501639 0.77351475]
mean value: 0.7039469003677368
key: score_time
value: [0.01389217 0.01472902 0.01154828 0.01132441 0.01137733 0.01125288
0.02320051 0.01134515 0.01451206 0.01475286]
mean value: 0.013793468475341797
key: test_mcc
value: [0.76277007 0.66414149 0.48856385 0.74047959 0.74242424 0.82575758
0.65151515 0.58930667 0.68313005 0.83205029]
mean value: 0.6980138984393582
key: train_mcc
value: [0.92211753 0.97077583 0.93174679 0.87320324 0.91259644 0.91259644
0.88292404 0.94163576 1. 1. ]
mean value: 0.9347596066723304
key: test_accuracy
value: [0.86956522 0.82608696 0.73913043 0.86956522 0.86956522 0.91304348
0.82608696 0.7826087 0.81818182 0.90909091]
mean value: 0.8422924901185771
key: train_accuracy
value: [0.96097561 0.98536585 0.96585366 0.93658537 0.95609756 0.95609756
0.94146341 0.97073171 1. 1. ]
mean value: 0.9673170731707317
key: test_fscore
value: [0.84210526 0.83333333 0.75 0.85714286 0.86956522 0.91666667
0.83333333 0.76190476 0.77777778 0.91666667]
mean value: 0.8358495877374597
key: train_fscore
value: [0.96153846 0.98550725 0.96618357 0.93719807 0.95652174 0.95652174
0.94117647 0.97029703 1. 1. ]
mean value: 0.9674944328979426
key: test_precision
value: [1. 0.76923077 0.69230769 0.9 0.90909091 0.91666667
0.83333333 0.88888889 1. 0.84615385]
mean value: 0.8755672105672105
key: train_precision
value: [0.95238095 0.98076923 0.96153846 0.93269231 0.94285714 0.94285714
0.94117647 0.98 1. 1. ]
mean value: 0.9634271708683473
key: test_recall
value: [0.72727273 0.90909091 0.81818182 0.81818182 0.83333333 0.91666667
0.83333333 0.66666667 0.63636364 1. ]
mean value: 0.8159090909090909
key: train_recall
value: [0.97087379 0.99029126 0.97087379 0.94174757 0.97058824 0.97058824
0.94117647 0.96078431 1. 1. ]
mean value: 0.9716923662668951
key: test_roc_auc
value: [0.86363636 0.82954545 0.74242424 0.86742424 0.87121212 0.91287879
0.82575758 0.78787879 0.81818182 0.90909091]
mean value: 0.8428030303030303
key: train_roc_auc
value: [0.96092709 0.98534171 0.96582905 0.93656006 0.9561679 0.9561679
0.94146202 0.97068342 1. 1. ]
mean value: 0.9673139158576052
key: test_jcc
value: [0.72727273 0.71428571 0.6 0.75 0.76923077 0.84615385
0.71428571 0.61538462 0.63636364 0.84615385]
mean value: 0.721913086913087
key: train_jcc
value: [0.92592593 0.97142857 0.93457944 0.88181818 0.91666667 0.91666667
0.88888889 0.94230769 1. 1. ]
mean value: 0.937828203295493
MCC on Blind test: 0.04
Accuracy on Blind test: 0.52
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01285195 0.00952053 0.00782275 0.00767875 0.00741482 0.0068872
0.00694108 0.00817084 0.00734329 0.00712538]
mean value: 0.0081756591796875
key: score_time
value: [0.01059294 0.00885248 0.00852776 0.00859118 0.00863194 0.00855207
0.00864601 0.00876474 0.00838137 0.00814605]
mean value: 0.008768653869628907
key: test_mcc
value: [0.11236664 0.43929769 0.44411739 0.41096386 0.47923384 0.50168817
0.40451992 0.55048188 0.47140452 0.20412415]
mean value: 0.4018198054310791
key: train_mcc
value: [0.3148712 0.50657911 0.52847427 0.5185658 0.43504485 0.51678072
0.4680327 0.45392287 0.49379046 0.43864549]
mean value: 0.4674707470646864
key: test_accuracy
value: [0.52173913 0.65217391 0.69565217 0.69565217 0.69565217 0.73913043
0.65217391 0.73913043 0.68181818 0.59090909]
mean value: 0.6664031620553359
key: train_accuracy
value: [0.60487805 0.72682927 0.73658537 0.72682927 0.68292683 0.73170732
0.70243902 0.69756098 0.7184466 0.68932039]
mean value: 0.7017523087852238
key: test_fscore
value: [0.64516129 0.73333333 0.74074074 0.72 0.77419355 0.78571429
0.75 0.8 0.75862069 0.66666667]
mean value: 0.7374430554819876
key: train_fscore
value: [0.71378092 0.77777778 0.78571429 0.78125 0.74903475 0.77911647
0.76078431 0.75590551 0.77165354 0.75193798]
mean value: 0.7626955550457906
key: test_precision
value: [0.5 0.57894737 0.625 0.64285714 0.63157895 0.6875
0.6 0.66666667 0.61111111 0.5625 ]
mean value: 0.6106161236424394
key: train_precision
value: [0.56111111 0.65771812 0.66442953 0.65359477 0.61783439 0.65986395
0.63398693 0.63157895 0.64900662 0.62580645]
mean value: 0.6354930823444798
key: test_recall
value: [0.90909091 1. 0.90909091 0.81818182 1. 0.91666667
1. 1. 1. 0.81818182]
mean value: 0.9371212121212121
key: train_recall
value: [0.98058252 0.95145631 0.96116505 0.97087379 0.95098039 0.95098039
0.95098039 0.94117647 0.95145631 0.94174757]
mean value: 0.9551399200456882
key: test_roc_auc
value: [0.53787879 0.66666667 0.70454545 0.70075758 0.68181818 0.73106061
0.63636364 0.72727273 0.68181818 0.59090909]
mean value: 0.6659090909090909
key: train_roc_auc
value: [0.60303636 0.72572816 0.73548449 0.72563297 0.68422806 0.73277175
0.70364554 0.69874358 0.7184466 0.68932039]
mean value: 0.7017037883114411
key: test_jcc
value: [0.47619048 0.57894737 0.58823529 0.5625 0.63157895 0.64705882
0.6 0.66666667 0.61111111 0.5 ]
mean value: 0.5862288687404786
key: train_jcc
value: [0.55494505 0.63636364 0.64705882 0.64102564 0.59876543 0.63815789
0.61392405 0.60759494 0.62820513 0.60248447]
mean value: 0.6168525070295942
MCC on Blind test: 0.37
Accuracy on Blind test: 0.65
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0081985 0.00712252 0.00714064 0.00715804 0.00717282 0.00716138
0.00722837 0.00712657 0.00720763 0.00717926]
mean value: 0.007269573211669922
key: score_time
value: [0.00871158 0.00801897 0.0079174 0.00812721 0.00795984 0.00803661
0.00796866 0.00801921 0.00809813 0.00810313]
mean value: 0.008096075057983399
key: test_mcc
value: [0.30240737 0.05427825 0.03816905 0.3030303 0.42228828 0.30240737
0.03816905 0.65151515 0.37796447 0.27272727]
mean value: 0.2762956564879194
key: train_mcc
value: [0.35623111 0.34638101 0.37560698 0.3463735 0.28783552 0.36612372
0.35687769 0.3658258 0.32044877 0.39058328]
mean value: 0.3512287378448915
key: test_accuracy
value: [0.65217391 0.52173913 0.52173913 0.65217391 0.69565217 0.65217391
0.52173913 0.82608696 0.68181818 0.63636364]
mean value: 0.6361660079051383
key: train_accuracy
value: [0.67804878 0.67317073 0.68780488 0.67317073 0.64390244 0.68292683
0.67804878 0.68292683 0.66019417 0.69417476]
mean value: 0.6754368932038836
key: test_fscore
value: [0.6 0.56 0.47619048 0.63636364 0.75862069 0.69230769
0.56 0.83333333 0.72 0.63636364]
mean value: 0.6473179464213947
key: train_fscore
value: [0.68571429 0.67942584 0.69230769 0.67317073 0.64390244 0.67336683
0.68571429 0.67980296 0.65686275 0.70967742]
mean value: 0.6779945226077302
key: test_precision
value: [0.66666667 0.5 0.5 0.63636364 0.64705882 0.64285714
0.53846154 0.83333333 0.64285714 0.63636364]
mean value: 0.6243961920432509
key: train_precision
value: [0.6728972 0.66981132 0.68571429 0.67647059 0.6407767 0.69072165
0.66666667 0.68316832 0.66336634 0.6754386 ]
mean value: 0.6725031656102882
key: test_recall
value: [0.54545455 0.63636364 0.45454545 0.63636364 0.91666667 0.75
0.58333333 0.83333333 0.81818182 0.63636364]
mean value: 0.681060606060606
key: train_recall
value: [0.69902913 0.68932039 0.69902913 0.66990291 0.64705882 0.65686275
0.70588235 0.67647059 0.65048544 0.74757282]
mean value: 0.6841614315629164
key: test_roc_auc
value: [0.64772727 0.52651515 0.51893939 0.65151515 0.68560606 0.64772727
0.51893939 0.82575758 0.68181818 0.63636364]
mean value: 0.634090909090909
key: train_roc_auc
value: [0.67794594 0.67309157 0.68774986 0.67318675 0.64391776 0.6828003
0.67818389 0.68289549 0.66019417 0.69417476]
mean value: 0.6754140491147915
key: test_jcc
value: [0.42857143 0.38888889 0.3125 0.46666667 0.61111111 0.52941176
0.38888889 0.71428571 0.5625 0.46666667]
mean value: 0.4869491129785247
key: train_jcc
value: [0.52173913 0.51449275 0.52941176 0.50735294 0.47482014 0.50757576
0.52173913 0.51492537 0.48905109 0.55 ]
mean value: 0.5131108089860595
MCC on Blind test: 0.35
Accuracy on Blind test: 0.67
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00741982 0.00747585 0.00764775 0.00689459 0.00716424 0.00761509
0.00779819 0.00776052 0.00681782 0.00763607]
mean value: 0.007422995567321777
key: score_time
value: [0.01033688 0.00987959 0.00997186 0.00971317 0.00995612 0.00998878
0.00992155 0.01000547 0.00932527 0.00970054]
mean value: 0.00987992286682129
key: test_mcc
value: [-0.12878788 0.3030303 0.12878788 0.38932432 0.56490196 0.15096491
0.50460839 0.65909298 0.32539569 0.37796447]
mean value: 0.3275283023309783
key: train_mcc
value: [0.61013747 0.58290698 0.56242364 0.62329827 0.62174364 0.55771431
0.60061066 0.58363235 0.58722022 0.59402749]
mean value: 0.5923715024669853
key: test_accuracy
value: [0.43478261 0.65217391 0.56521739 0.69565217 0.7826087 0.56521739
0.69565217 0.82608696 0.63636364 0.68181818]
mean value: 0.6535573122529644
key: train_accuracy
value: [0.80487805 0.7902439 0.7804878 0.8097561 0.8097561 0.77560976
0.8 0.7902439 0.79126214 0.7961165 ]
mean value: 0.7948354250532796
key: test_fscore
value: [0.43478261 0.63636364 0.54545455 0.66666667 0.8 0.5
0.58823529 0.84615385 0.5 0.63157895]
mean value: 0.6149235544820415
key: train_fscore
value: [0.80952381 0.78172589 0.77386935 0.8 0.8 0.75531915
0.79396985 0.77720207 0.77720207 0.78787879]
mean value: 0.785669097572126
key: test_precision
value: [0.41666667 0.63636364 0.54545455 0.7 0.76923077 0.625
1. 0.78571429 0.8 0.75 ]
mean value: 0.7028429903429904
key: train_precision
value: [0.79439252 0.81914894 0.80208333 0.84782609 0.83870968 0.8255814
0.81443299 0.82417582 0.83333333 0.82105263]
mean value: 0.8220736731371573
key: test_recall
value: [0.45454545 0.63636364 0.54545455 0.63636364 0.83333333 0.41666667
0.41666667 0.91666667 0.36363636 0.54545455]
mean value: 0.5765151515151515
key: train_recall
value: [0.82524272 0.74757282 0.74757282 0.75728155 0.76470588 0.69607843
0.7745098 0.73529412 0.72815534 0.75728155]
mean value: 0.7533695031410622
key: test_roc_auc
value: [0.43560606 0.65151515 0.56439394 0.69318182 0.78030303 0.5719697
0.70833333 0.8219697 0.63636364 0.68181818]
mean value: 0.6545454545454545
key: train_roc_auc
value: [0.80477822 0.79045307 0.78064915 0.81001333 0.80953741 0.77522368
0.79987626 0.78997716 0.79126214 0.7961165 ]
mean value: 0.7947886921758995
key: test_jcc
value: [0.27777778 0.46666667 0.375 0.5 0.66666667 0.33333333
0.41666667 0.73333333 0.33333333 0.46153846]
mean value: 0.45643162393162395
key: train_jcc
value: [0.68 0.64166667 0.63114754 0.66666667 0.66666667 0.60683761
0.65833333 0.63559322 0.63559322 0.65 ]
mean value: 0.6472504921832513
MCC on Blind test: 0.17
Accuracy on Blind test: 0.59
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00950766 0.0092063 0.0089941 0.00913739 0.00906396 0.00937152
0.00898767 0.0092783 0.00926185 0.00896335]
mean value: 0.009177207946777344
key: score_time
value: [0.00941825 0.00845361 0.00863886 0.00895977 0.00840282 0.00849152
0.00842237 0.00841475 0.00835538 0.00848985]
mean value: 0.008604717254638673
key: test_mcc
value: [0.30240737 0.48856385 0.38932432 0.48075018 0.65151515 0.66414149
0.65151515 0.74242424 0.63636364 0.36514837]
mean value: 0.5372153759035299
key: train_mcc
value: [0.81500527 0.7606076 0.79704499 0.77749321 0.72682277 0.73662669
0.70844205 0.76709739 0.76829494 0.738735 ]
mean value: 0.7596169926310354
key: test_accuracy
value: [0.65217391 0.73913043 0.69565217 0.73913043 0.82608696 0.82608696
0.82608696 0.86956522 0.81818182 0.68181818]
mean value: 0.7673913043478261
key: train_accuracy
value: [0.90731707 0.87804878 0.89756098 0.88780488 0.86341463 0.86829268
0.85365854 0.88292683 0.88349515 0.86893204]
mean value: 0.8791451574709922
key: test_fscore
value: [0.6 0.75 0.66666667 0.7 0.83333333 0.81818182
0.83333333 0.86956522 0.81818182 0.66666667]
mean value: 0.7555928853754941
key: train_fscore
value: [0.90640394 0.87179487 0.89447236 0.88442211 0.8627451 0.86829268
0.84848485 0.87878788 0.88 0.86567164]
mean value: 0.8761075435073197
key: test_precision
value: [0.66666667 0.69230769 0.7 0.77777778 0.83333333 0.9
0.83333333 0.90909091 0.81818182 0.7 ]
mean value: 0.783069153069153
key: train_precision
value: [0.92 0.92391304 0.92708333 0.91666667 0.8627451 0.86407767
0.875 0.90625 0.90721649 0.8877551 ]
mean value: 0.8990707408306566
key: test_recall
value: [0.54545455 0.81818182 0.63636364 0.63636364 0.83333333 0.75
0.83333333 0.83333333 0.81818182 0.63636364]
mean value: 0.7340909090909091
key: train_recall
value: [0.89320388 0.82524272 0.86407767 0.85436893 0.8627451 0.87254902
0.82352941 0.85294118 0.85436893 0.84466019]
mean value: 0.854768703597944
key: test_roc_auc
value: [0.64772727 0.74242424 0.69318182 0.73484848 0.82575758 0.82954545
0.82575758 0.87121212 0.81818182 0.68181818]
mean value: 0.7670454545454546
key: train_roc_auc
value: [0.90738626 0.87830763 0.89772511 0.88796878 0.86341138 0.86831334
0.85351228 0.88278127 0.88349515 0.86893204]
mean value: 0.8791833238149629
key: test_jcc
value: [0.42857143 0.6 0.5 0.53846154 0.71428571 0.69230769
0.71428571 0.76923077 0.69230769 0.5 ]
mean value: 0.6149450549450549
key: train_jcc
value: [0.82882883 0.77272727 0.80909091 0.79279279 0.75862069 0.76724138
0.73684211 0.78378378 0.78571429 0.76315789]
mean value: 0.779879994190339
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.7920084 0.75999999 0.51944804 0.73570108 0.81945181 0.88329148
0.80979872 0.6771915 0.85314083 0.79645777]
mean value: 0.7646489620208741
key: score_time
value: [0.0137279 0.01358175 0.01116061 0.01169181 0.01440668 0.01173091
0.01179957 0.01326561 0.01169682 0.01486826]
mean value: 0.01279299259185791
key: test_mcc
value: [0.47727273 0.58930667 0.41096386 0.91605722 0.74242424 0.74047959
0.91605722 0.82575758 0.83205029 0.64715023]
mean value: 0.7097519634627414
key: train_mcc
value: [0.90516294 0.89271776 0.80864195 0.87320324 0.95126594 0.88361919
0.90261781 0.84407425 0.87415728 0.90291262]
mean value: 0.8838372984769908
key: test_accuracy
value: [0.73913043 0.7826087 0.69565217 0.95652174 0.86956522 0.86956522
0.95652174 0.91304348 0.90909091 0.81818182]
mean value: 0.8509881422924901
key: train_accuracy
value: [0.95121951 0.94634146 0.90243902 0.93658537 0.97560976 0.94146341
0.95121951 0.92195122 0.9368932 0.95145631]
mean value: 0.941517878285579
key: test_fscore
value: [0.72727273 0.8 0.72 0.95238095 0.86956522 0.88
0.96 0.91666667 0.91666667 0.8 ]
mean value: 0.8542552230378317
key: train_fscore
value: [0.95327103 0.9468599 0.90740741 0.93719807 0.97560976 0.94230769
0.95145631 0.9223301 0.93779904 0.95145631]
mean value: 0.942569561637334
key: test_precision
value: [0.72727273 0.71428571 0.64285714 1. 0.90909091 0.84615385
0.92307692 0.91666667 0.84615385 0.88888889]
mean value: 0.8414446664446664
key: train_precision
value: [0.91891892 0.94230769 0.86725664 0.93269231 0.97087379 0.9245283
0.94230769 0.91346154 0.9245283 0.95145631]
mean value: 0.9288331487717255
key: test_recall
value: [0.72727273 0.90909091 0.81818182 0.90909091 0.83333333 0.91666667
1. 0.91666667 1. 0.72727273]
mean value: 0.8757575757575757
key: train_recall
value: [0.99029126 0.95145631 0.95145631 0.94174757 0.98039216 0.96078431
0.96078431 0.93137255 0.95145631 0.95145631]
mean value: 0.9571197411003236
key: test_roc_auc
value: [0.73863636 0.78787879 0.70075758 0.95454545 0.87121212 0.86742424
0.95454545 0.91287879 0.90909091 0.81818182]
mean value: 0.8515151515151514
key: train_roc_auc
value: [0.95102798 0.94631639 0.90219874 0.93656006 0.97563297 0.94155721
0.95126594 0.92199695 0.9368932 0.95145631]
mean value: 0.9414905768132495
key: test_jcc
value: [0.57142857 0.66666667 0.5625 0.90909091 0.76923077 0.78571429
0.92307692 0.84615385 0.84615385 0.66666667]
mean value: 0.7546682484182484
key: train_jcc
value: [0.91071429 0.89908257 0.83050847 0.88181818 0.95238095 0.89090909
0.90740741 0.85585586 0.88288288 0.90740741]
mean value: 0.8918967107759675
MCC on Blind test: 0.29
Accuracy on Blind test: 0.64
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.0114994 0.01038074 0.00875664 0.00789666 0.00787902 0.00791883
0.008286 0.00795555 0.00806046 0.00805378]
mean value: 0.00866870880126953
key: score_time
value: [0.0112102 0.00881648 0.00798893 0.00786543 0.00786877 0.00795722
0.00800657 0.00791478 0.00787568 0.00790453]
mean value: 0.008340859413146972
key: test_mcc
value: [0.74047959 0.41096386 0.74242424 0.91666667 0.58930667 0.83971912
0.58002308 0.91666667 0.83205029 0.91287093]
mean value: 0.7481171113402942
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86956522 0.69565217 0.86956522 0.95652174 0.7826087 0.91304348
0.7826087 0.95652174 0.90909091 0.95454545]
mean value: 0.8689723320158103
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.72 0.86956522 0.95652174 0.76190476 0.90909091
0.81481481 0.95652174 0.9 0.95238095]
mean value: 0.8697942990986469
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.64285714 0.83333333 0.91666667 0.88888889 1.
0.73333333 1. 1. 1. ]
mean value: 0.8915079365079365
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 0.81818182 0.90909091 1. 0.66666667 0.83333333
0.91666667 0.91666667 0.81818182 0.90909091]
mean value: 0.8606060606060606
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86742424 0.70075758 0.87121212 0.95833333 0.78787879 0.91666667
0.77651515 0.95833333 0.90909091 0.95454545]
mean value: 0.8700757575757576
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.5625 0.76923077 0.91666667 0.61538462 0.83333333
0.6875 0.91666667 0.81818182 0.90909091]
mean value: 0.7778554778554778
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.03
Accuracy on Blind test: 0.51
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.09773445 0.08627319 0.08551788 0.08611631 0.08678699 0.08618236
0.08573103 0.08577466 0.08596325 0.08555889]
mean value: 0.08716390132904053
key: score_time
value: [0.01986361 0.01779342 0.01704097 0.01689577 0.01845622 0.01685524
0.01741266 0.01715446 0.0169692 0.01674438]
mean value: 0.01751859188079834
key: test_mcc
value: [0.74047959 0.76764947 0.56818182 0.82575758 0.82575758 0.91605722
0.65909298 1. 0.83205029 0.83205029]
mean value: 0.7967076829215525
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86956522 0.86956522 0.7826087 0.91304348 0.91304348 0.95652174
0.82608696 1. 0.90909091 0.90909091]
mean value: 0.8948616600790513
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.88 0.7826087 0.90909091 0.91666667 0.96
0.84615385 1. 0.91666667 0.9 ]
mean value: 0.896832964137312
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.78571429 0.75 0.90909091 0.91666667 0.92307692
0.78571429 1. 0.84615385 1. ]
mean value: 0.8816416916416916
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 1. 0.81818182 0.90909091 0.91666667 1.
0.91666667 1. 1. 0.81818182]
mean value: 0.9196969696969697
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86742424 0.875 0.78409091 0.91287879 0.91287879 0.95454545
0.8219697 1. 0.90909091 0.90909091]
mean value: 0.8946969696969697
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.78571429 0.64285714 0.83333333 0.84615385 0.92307692
0.73333333 1. 0.84615385 0.81818182]
mean value: 0.8178804528804529
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.28
Accuracy on Blind test: 0.62
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00724578 0.00700259 0.00706482 0.00705457 0.00700569 0.00709224
0.00697279 0.00716543 0.00719452 0.00722599]
mean value: 0.007102441787719726
key: score_time
value: [0.00805378 0.00790071 0.00796628 0.00789952 0.00804639 0.00786138
0.00799799 0.00784135 0.00800538 0.00841856]
mean value: 0.007999134063720704
key: test_mcc
value: [ 0.48075018 0.47727273 0.47727273 -0.04545455 0.56490196 0.31298622
0.48075018 0.38932432 0.54772256 0.63636364]
mean value: 0.4321889948051151
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73913043 0.73913043 0.73913043 0.47826087 0.7826087 0.65217391
0.73913043 0.69565217 0.77272727 0.81818182]
mean value: 0.7156126482213438
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7 0.72727273 0.72727273 0.45454545 0.8 0.63636364
0.76923077 0.72 0.76190476 0.81818182]
mean value: 0.7114771894771895
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.72727273 0.72727273 0.45454545 0.76923077 0.7
0.71428571 0.69230769 0.8 0.81818182]
mean value: 0.7180874680874682
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.63636364 0.72727273 0.72727273 0.45454545 0.83333333 0.58333333
0.83333333 0.75 0.72727273 0.81818182]
mean value: 0.7090909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73484848 0.73863636 0.73863636 0.47727273 0.78030303 0.65530303
0.73484848 0.69318182 0.77272727 0.81818182]
mean value: 0.7143939393939394
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.53846154 0.57142857 0.57142857 0.29411765 0.66666667 0.46666667
0.625 0.5625 0.61538462 0.69230769]
mean value: 0.5603961969403146
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.1
Accuracy on Blind test: 0.45
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.09740305 1.13326406 1.09444618 1.15521884 1.09004188 1.09466553
1.09249401 1.09514403 1.09620023 1.09324002]
mean value: 1.1042117834091187
key: score_time
value: [0.08939648 0.09209704 0.08982658 0.08881688 0.0895195 0.08898306
0.0902555 0.0890646 0.09453082 0.08895087]
mean value: 0.09014413356781006
key: test_mcc
value: [0.83743579 0.58930667 0.58930667 1. 0.74242424 0.91666667
0.82575758 1. 1. 0.81818182]
mean value: 0.8319079425560323
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91304348 0.7826087 0.7826087 1. 0.86956522 0.95652174
0.91304348 1. 1. 0.90909091]
mean value: 0.9126482213438735
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.8 0.8 1. 0.86956522 0.95652174
0.91666667 1. 1. 0.90909091]
mean value: 0.9151844532279315
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.71428571 0.71428571 1. 0.90909091 1.
0.91666667 1. 1. 0.90909091]
mean value: 0.9163419913419913
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 0.90909091 0.90909091 1. 0.83333333 0.91666667
0.91666667 1. 1. 0.90909091]
mean value: 0.9212121212121211
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90909091 0.78787879 0.78787879 1. 0.87121212 0.95833333
0.91287879 1. 1. 0.90909091]
mean value: 0.9136363636363636
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.66666667 0.66666667 1. 0.76923077 0.91666667
0.84615385 1. 1. 0.83333333]
mean value: 0.8516899766899767
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.15
Accuracy on Blind test: 0.55
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.8280468 0.83096433 0.89360881 0.91945672 0.93384838 0.88851166
0.8646996 0.91655803 0.83419728 0.87553668]
mean value: 0.8785428285598755
key: score_time
value: [0.22438312 0.13418078 0.20978713 0.20942521 0.23700547 0.21879888
0.21615219 0.20876241 0.20610762 0.21533132]
mean value: 0.20799341201782226
key: test_mcc
value: [0.76277007 0.6992059 0.58930667 1. 0.74242424 0.91666667
0.74047959 1. 0.73029674 0.63636364]
mean value: 0.7817513515626897
key: train_mcc
value: [0.97077583 0.961154 0.98067223 0.961154 0.96116136 0.96116136
0.94219063 0.96097468 0.97091955 0.94245853]
mean value: 0.9612622141858389
key: test_accuracy
value: [0.86956522 0.82608696 0.7826087 1. 0.86956522 0.95652174
0.86956522 1. 0.86363636 0.81818182]
mean value: 0.8855731225296443
key: train_accuracy
value: [0.98536585 0.9804878 0.9902439 0.9804878 0.9804878 0.9804878
0.97073171 0.9804878 0.98543689 0.97087379]
mean value: 0.9805091167416529
key: test_fscore
value: [0.84210526 0.84615385 0.8 1. 0.86956522 0.95652174
0.88 1. 0.86956522 0.81818182]
mean value: 0.8882093101406603
key: train_fscore
value: [0.98550725 0.98076923 0.99038462 0.98076923 0.98058252 0.98058252
0.97115385 0.98039216 0.98550725 0.97142857]
mean value: 0.9807077192665552
key: test_precision
value: [1. 0.73333333 0.71428571 1. 0.90909091 1.
0.84615385 1. 0.83333333 0.81818182]
mean value: 0.8854378954378954
key: train_precision
value: [0.98076923 0.97142857 0.98095238 0.97142857 0.97115385 0.97115385
0.95283019 0.98039216 0.98076923 0.95327103]
mean value: 0.9714149051235051
key: test_recall
value: [0.72727273 1. 0.90909091 1. 0.83333333 0.91666667
0.91666667 1. 0.90909091 0.81818182]
mean value: 0.9030303030303031
key: train_recall
value: [0.99029126 0.99029126 1. 0.99029126 0.99019608 0.99019608
0.99019608 0.98039216 0.99029126 0.99029126]
mean value: 0.9902436702836475
key: test_roc_auc
value: [0.86363636 0.83333333 0.78787879 1. 0.87121212 0.95833333
0.86742424 1. 0.86363636 0.81818182]
mean value: 0.8863636363636364
key: train_roc_auc
value: [0.98534171 0.98043975 0.99019608 0.98043975 0.98053493 0.98053493
0.97082619 0.98048734 0.98543689 0.97087379]
mean value: 0.9805111364934324
key: test_jcc
value: [0.72727273 0.73333333 0.66666667 1. 0.76923077 0.91666667
0.78571429 1. 0.76923077 0.69230769]
mean value: 0.8060422910422911
key: train_jcc
value: [0.97142857 0.96226415 0.98095238 0.96226415 0.96190476 0.96190476
0.94392523 0.96153846 0.97142857 0.94444444]
mean value: 0.9622055489133606
MCC on Blind test: 0.26
Accuracy on Blind test: 0.61
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01719785 0.00701189 0.00701404 0.00702286 0.0074749 0.00695348
0.00701451 0.00692368 0.00781441 0.00701284]
mean value: 0.008144044876098632
key: score_time
value: [0.01571059 0.00787878 0.00795388 0.00786066 0.00811124 0.00786138
0.00785327 0.00791693 0.00871611 0.00790095]
mean value: 0.008776378631591798
key: test_mcc
value: [0.30240737 0.05427825 0.03816905 0.3030303 0.42228828 0.30240737
0.03816905 0.65151515 0.37796447 0.27272727]
mean value: 0.2762956564879194
key: train_mcc
value: [0.35623111 0.34638101 0.37560698 0.3463735 0.28783552 0.36612372
0.35687769 0.3658258 0.32044877 0.39058328]
mean value: 0.3512287378448915
key: test_accuracy
value: [0.65217391 0.52173913 0.52173913 0.65217391 0.69565217 0.65217391
0.52173913 0.82608696 0.68181818 0.63636364]
mean value: 0.6361660079051383
key: train_accuracy
value: [0.67804878 0.67317073 0.68780488 0.67317073 0.64390244 0.68292683
0.67804878 0.68292683 0.66019417 0.69417476]
mean value: 0.6754368932038836
key: test_fscore
value: [0.6 0.56 0.47619048 0.63636364 0.75862069 0.69230769
0.56 0.83333333 0.72 0.63636364]
mean value: 0.6473179464213947
key: train_fscore
value: [0.68571429 0.67942584 0.69230769 0.67317073 0.64390244 0.67336683
0.68571429 0.67980296 0.65686275 0.70967742]
mean value: 0.6779945226077302
key: test_precision
value: [0.66666667 0.5 0.5 0.63636364 0.64705882 0.64285714
0.53846154 0.83333333 0.64285714 0.63636364]
mean value: 0.6243961920432509
key: train_precision
value: [0.6728972 0.66981132 0.68571429 0.67647059 0.6407767 0.69072165
0.66666667 0.68316832 0.66336634 0.6754386 ]
mean value: 0.6725031656102882
key: test_recall
value: [0.54545455 0.63636364 0.45454545 0.63636364 0.91666667 0.75
0.58333333 0.83333333 0.81818182 0.63636364]
mean value: 0.681060606060606
key: train_recall
value: [0.69902913 0.68932039 0.69902913 0.66990291 0.64705882 0.65686275
0.70588235 0.67647059 0.65048544 0.74757282]
mean value: 0.6841614315629164
key: test_roc_auc
value: [0.64772727 0.52651515 0.51893939 0.65151515 0.68560606 0.64772727
0.51893939 0.82575758 0.68181818 0.63636364]
mean value: 0.634090909090909
key: train_roc_auc
value: [0.67794594 0.67309157 0.68774986 0.67318675 0.64391776 0.6828003
0.67818389 0.68289549 0.66019417 0.69417476]
mean value: 0.6754140491147915
key: test_jcc
value: [0.42857143 0.38888889 0.3125 0.46666667 0.61111111 0.52941176
0.38888889 0.71428571 0.5625 0.46666667]
mean value: 0.4869491129785247
key: train_jcc
value: [0.52173913 0.51449275 0.52941176 0.50735294 0.47482014 0.50757576
0.52173913 0.51492537 0.48905109 0.55 ]
mean value: 0.5131108089860595
MCC on Blind test: 0.35
Accuracy on Blind test: 0.67
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.08559012 0.14609933 0.03756452 0.03884339 0.04227829 0.08023286
0.03739309 0.06376338 0.03956223 0.03932667]
mean value: 0.061065387725830075
key: score_time
value: [0.01105213 0.01027513 0.01021671 0.00977039 0.00970769 0.01002121
0.00953698 0.00958061 0.00958157 0.00957847]
mean value: 0.00993208885192871
key: test_mcc
value: [0.83743579 0.58930667 0.66414149 0.91605722 0.74242424 0.91666667
0.91605722 1. 1. 0.81818182]
mean value: 0.8400271120875227
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91304348 0.7826087 0.82608696 0.95652174 0.86956522 0.95652174
0.95652174 1. 1. 0.90909091]
mean value: 0.9169960474308301
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9 0.8 0.83333333 0.95238095 0.86956522 0.95652174
0.96 1. 1. 0.90909091]
mean value: 0.9180892151326934
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.71428571 0.76923077 1. 0.90909091 1.
0.92307692 1. 1. 0.90909091]
mean value: 0.9224775224775225
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 0.90909091 0.90909091 0.90909091 0.83333333 0.91666667
1. 1. 1. 0.90909091]
mean value: 0.9204545454545454
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90909091 0.78787879 0.82954545 0.95454545 0.87121212 0.95833333
0.95454545 1. 1. 0.90909091]
mean value: 0.9174242424242425
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.81818182 0.66666667 0.71428571 0.90909091 0.76923077 0.91666667
0.92307692 1. 1. 0.83333333]
mean value: 0.8550532800532801
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.07
Accuracy on Blind test: 0.53
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01432657 0.03284144 0.03134632 0.03231716 0.03199744 0.03253031
0.03206563 0.03227401 0.03221512 0.03237677]
mean value: 0.0304290771484375
key: score_time
value: [0.0105865 0.02112436 0.02062201 0.0215013 0.01899457 0.01899886
0.01989794 0.02078581 0.01071429 0.02165031]
mean value: 0.01848759651184082
key: test_mcc
value: [0.48075018 0.65151515 0.39393939 1. 0.66414149 0.91666667
0.58002308 0.91666667 0.75592895 0.81818182]
mean value: 0.7177813381485896
key: train_mcc
value: [0.90310636 0.89271776 0.91224062 0.86358877 0.87320324 0.88292404
0.89271776 0.86341138 0.89358299 0.84481947]
mean value: 0.882231240068856
key: test_accuracy
value: [0.73913043 0.82608696 0.69565217 1. 0.82608696 0.95652174
0.7826087 0.95652174 0.86363636 0.90909091]
mean value: 0.8555335968379447
key: train_accuracy
value: [0.95121951 0.94634146 0.95609756 0.93170732 0.93658537 0.94146341
0.94634146 0.93170732 0.94660194 0.9223301 ]
mean value: 0.9410395453469098
key: test_fscore
value: [0.7 0.81818182 0.69565217 1. 0.81818182 0.95652174
0.81481481 0.95652174 0.84210526 0.90909091]
mean value: 0.8511070275601168
key: train_fscore
value: [0.95238095 0.9468599 0.95609756 0.93137255 0.93596059 0.94117647
0.94581281 0.93137255 0.94736842 0.92156863]
mean value: 0.9409970432884046
key: test_precision
value: [0.77777778 0.81818182 0.66666667 1. 0.9 1.
0.73333333 1. 1. 0.90909091]
mean value: 0.8805050505050505
key: train_precision
value: [0.93457944 0.94230769 0.96078431 0.94059406 0.94059406 0.94117647
0.95049505 0.93137255 0.93396226 0.93069307]
mean value: 0.9406558966668068
key: test_recall
value: [0.63636364 0.81818182 0.72727273 1. 0.75 0.91666667
0.91666667 0.91666667 0.72727273 0.90909091]
mean value: 0.8318181818181818
key: train_recall
value: [0.97087379 0.95145631 0.95145631 0.9223301 0.93137255 0.94117647
0.94117647 0.93137255 0.96116505 0.91262136]
mean value: 0.9415000951837046
key: test_roc_auc
value: [0.73484848 0.82575758 0.6969697 1. 0.82954545 0.95833333
0.77651515 0.95833333 0.86363636 0.90909091]
mean value: 0.8553030303030302
key: train_roc_auc
value: [0.95112317 0.94631639 0.95612031 0.93175328 0.93656006 0.94146202
0.94631639 0.93170569 0.94660194 0.9223301 ]
mean value: 0.9410289358461832
key: test_jcc
value: [0.53846154 0.69230769 0.53333333 1. 0.69230769 0.91666667
0.6875 0.91666667 0.72727273 0.83333333]
mean value: 0.7537849650349651
key: train_jcc
value: [0.90909091 0.89908257 0.91588785 0.87155963 0.87962963 0.88888889
0.89719626 0.87155963 0.9 0.85454545]
mean value: 0.8887440829166801
MCC on Blind test: 0.07
Accuracy on Blind test: 0.53
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02144146 0.00733709 0.00703239 0.00698829 0.00709367 0.0077188
0.00785375 0.00805616 0.00777936 0.00778151]
mean value: 0.008908247947692871
key: score_time
value: [0.00882101 0.00820684 0.00806975 0.00794053 0.00805497 0.00866318
0.00886822 0.00857925 0.00864053 0.00864553]
mean value: 0.008448982238769531
key: test_mcc
value: [0.30240737 0.05427825 0.21969697 0.3030303 0.39727608 0.30240737
0.39393939 0.56818182 0.29277002 0.09090909]
mean value: 0.29248966588600495
key: train_mcc
value: [0.37650652 0.40495245 0.36648346 0.34638101 0.41611143 0.36642547
0.29790481 0.32736295 0.38836782 0.33048671]
mean value: 0.3620982636826904
key: test_accuracy
value: [0.65217391 0.52173913 0.60869565 0.65217391 0.69565217 0.65217391
0.69565217 0.7826087 0.63636364 0.54545455]
mean value: 0.6442687747035574
key: train_accuracy
value: [0.68780488 0.70243902 0.68292683 0.67317073 0.70731707 0.68292683
0.64878049 0.66341463 0.69417476 0.66504854]
mean value: 0.6808003788775752
key: test_fscore
value: [0.6 0.56 0.60869565 0.63636364 0.74074074 0.69230769
0.69565217 0.7826087 0.69230769 0.54545455]
mean value: 0.6554130828913438
key: train_fscore
value: [0.70093458 0.70813397 0.69483568 0.67942584 0.71698113 0.68899522
0.65384615 0.66985646 0.69565217 0.67298578]
mean value: 0.6881646985269205
key: test_precision
value: [0.66666667 0.5 0.58333333 0.63636364 0.66666667 0.64285714
0.72727273 0.81818182 0.6 0.54545455]
mean value: 0.6386796536796537
key: train_precision
value: [0.67567568 0.69811321 0.67272727 0.66981132 0.69090909 0.6728972
0.64150943 0.65420561 0.69230769 0.65740741]
mean value: 0.6725563905029608
key: test_recall
value: [0.54545455 0.63636364 0.63636364 0.63636364 0.83333333 0.75
0.66666667 0.75 0.81818182 0.54545455]
mean value: 0.6818181818181818
key: train_recall
value: [0.72815534 0.7184466 0.7184466 0.68932039 0.74509804 0.70588235
0.66666667 0.68627451 0.69902913 0.68932039]
mean value: 0.7046640015229393
key: test_roc_auc
value: [0.64772727 0.52651515 0.60984848 0.65151515 0.68939394 0.64772727
0.6969697 0.78409091 0.63636364 0.54545455]
mean value: 0.643560606060606
key: train_roc_auc
value: [0.68760708 0.70236056 0.68275271 0.67309157 0.70750048 0.68303826
0.64886731 0.6635256 0.69417476 0.66504854]
mean value: 0.6807966876070817
key: test_jcc
value: [0.42857143 0.38888889 0.4375 0.46666667 0.58823529 0.52941176
0.53333333 0.64285714 0.52941176 0.375 ]
mean value: 0.49198762838468724
key: train_jcc
value: [0.53956835 0.54814815 0.5323741 0.51449275 0.55882353 0.52554745
0.48571429 0.50359712 0.53333333 0.50714286]
mean value: 0.5248741920974376
MCC on Blind test: 0.39
Accuracy on Blind test: 0.69
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00869298 0.01086235 0.01095939 0.01100326 0.01040626 0.01017833
0.01074862 0.01032758 0.01047254 0.01042032]
mean value: 0.010407161712646485
key: score_time
value: [0.00884461 0.01061177 0.01056266 0.01120257 0.01041794 0.01044512
0.01040959 0.01039767 0.01039672 0.0103991 ]
mean value: 0.010368776321411134
key: test_mcc
value: [0.65909298 0.47923384 0.5164589 0.69084928 0.74242424 0.74242424
0.76277007 0.69084928 0.83205029 0.54232614]
mean value: 0.6658479277717093
key: train_mcc
value: [0.8373082 0.60342152 0.90672005 0.74004127 0.84982541 0.84787319
0.87166073 0.56519801 0.82977382 0.63500064]
mean value: 0.7686822837406401
key: test_accuracy
value: [0.82608696 0.69565217 0.73913043 0.82608696 0.86956522 0.86956522
0.86956522 0.82608696 0.90909091 0.72727273]
mean value: 0.8158102766798419
key: train_accuracy
value: [0.91707317 0.77073171 0.95121951 0.85365854 0.92195122 0.92195122
0.93170732 0.74146341 0.90776699 0.79126214]
mean value: 0.8708785223774568
key: test_fscore
value: [0.8 0.53333333 0.76923077 0.77777778 0.86956522 0.86956522
0.88888889 0.85714286 0.91666667 0.625 ]
mean value: 0.7907170727822902
key: train_fscore
value: [0.92093023 0.70807453 0.9537037 0.82954545 0.92592593 0.91752577
0.93577982 0.79377432 0.91555556 0.73939394]
mean value: 0.8640209254619995
key: test_precision
value: [0.88888889 1. 0.66666667 1. 0.90909091 0.90909091
0.8 0.75 0.84615385 1. ]
mean value: 0.8769891219891219
key: train_precision
value: [0.88392857 0.98275862 0.91150442 1. 0.87719298 0.9673913
0.87931034 0.65806452 0.8442623 0.98387097]
mean value: 0.8988284027481475
key: test_recall
value: [0.72727273 0.36363636 0.90909091 0.63636364 0.83333333 0.83333333
1. 1. 1. 0.45454545]
mean value: 0.7757575757575758
key: train_recall
value: [0.96116505 0.55339806 1. 0.70873786 0.98039216 0.87254902
1. 1. 1. 0.59223301]
mean value: 0.8668475157053113
key: test_roc_auc
value: [0.8219697 0.68181818 0.74621212 0.81818182 0.87121212 0.87121212
0.86363636 0.81818182 0.90909091 0.72727273]
mean value: 0.8128787878787879
key: train_roc_auc
value: [0.91685703 0.77179707 0.95098039 0.85436893 0.92223491 0.9217114
0.93203883 0.74271845 0.90776699 0.79126214]
mean value: 0.8711736150770988
key: test_jcc
value: [0.66666667 0.36363636 0.625 0.63636364 0.76923077 0.76923077
0.8 0.75 0.84615385 0.45454545]
mean value: 0.6680827505827506
key: train_jcc
value: [0.85344828 0.54807692 0.91150442 0.70873786 0.86206897 0.84761905
0.87931034 0.65806452 0.8442623 0.58653846]
mean value: 0.769963111850876
MCC on Blind test: 0.26
Accuracy on Blind test: 0.63
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01068044 0.01032329 0.01064396 0.01066589 0.01067901 0.01030278
0.01019835 0.01055288 0.01074505 0.01002216]
mean value: 0.0104813814163208
key: score_time
value: [0.01096964 0.01078129 0.01076531 0.01084399 0.01041245 0.01038766
0.01039839 0.0104928 0.01040888 0.01040125]
mean value: 0.010586166381835937
key: test_mcc
value: [0.56490196 0.40451992 0.33371191 0.74047959 0.74242424 0.58002308
0.74242424 0.91666667 1. 0.31622777]
mean value: 0.6341379361751176
key: train_mcc
value: [0.84083863 0.65525342 0.84965937 0.84102851 0.91330072 0.82620413
0.82825757 0.85400014 0.87581131 0.46017899]
mean value: 0.7944532795671018
key: test_accuracy
value: [0.7826087 0.65217391 0.65217391 0.86956522 0.86956522 0.7826087
0.86956522 0.95652174 1. 0.59090909]
mean value: 0.8025691699604743
key: train_accuracy
value: [0.91707317 0.8 0.92195122 0.91707317 0.95609756 0.90731707
0.91219512 0.92682927 0.9368932 0.67475728]
mean value: 0.8870187070802747
key: test_fscore
value: [0.76190476 0.42857143 0.69230769 0.85714286 0.86956522 0.81481481
0.86956522 0.95652174 1. 0.30769231]
mean value: 0.7558086036346906
key: train_fscore
value: [0.92237443 0.75151515 0.9266055 0.9119171 0.9569378 0.91402715
0.90721649 0.92537313 0.93896714 0.51798561]
mean value: 0.8672919508970722
key: test_precision
value: [0.8 1. 0.6 0.9 0.90909091 0.73333333
0.90909091 1. 1. 1. ]
mean value: 0.8851515151515151
key: train_precision
value: [0.87068966 1. 0.87826087 0.97777778 0.93457944 0.8487395
0.95652174 0.93939394 0.90909091 1. ]
mean value: 0.9315053825181348
key: test_recall
value: [0.72727273 0.27272727 0.81818182 0.81818182 0.83333333 0.91666667
0.83333333 0.91666667 1. 0.18181818]
mean value: 0.7318181818181818
key: train_recall
value: [0.98058252 0.60194175 0.98058252 0.85436893 0.98039216 0.99019608
0.8627451 0.91176471 0.97087379 0.34951456]
mean value: 0.848296211688559
key: test_roc_auc
value: [0.78030303 0.63636364 0.65909091 0.86742424 0.87121212 0.77651515
0.87121212 0.95833333 1. 0.59090909]
mean value: 0.8011363636363636
key: train_roc_auc
value: [0.91676185 0.80097087 0.92166381 0.91738054 0.9562155 0.9077194
0.91195507 0.92675614 0.9368932 0.67475728]
mean value: 0.8871073672187322
key: test_jcc
value: [0.61538462 0.27272727 0.52941176 0.75 0.76923077 0.6875
0.76923077 0.91666667 1. 0.18181818]
mean value: 0.6491970039764158
key: train_jcc
value: [0.8559322 0.60194175 0.86324786 0.83809524 0.91743119 0.84166667
0.83018868 0.86111111 0.88495575 0.34951456]
mean value: 0.7844085017308544
MCC on Blind test: 0.21
Accuracy on Blind test: 0.61
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.08768725 0.0763936 0.07663751 0.07690692 0.07599807 0.07673168
0.07730794 0.07686925 0.07642341 0.07606792]
mean value: 0.07770235538482666
key: score_time
value: [0.01560497 0.01552868 0.0159018 0.01562333 0.01548648 0.01561403
0.01570988 0.0160079 0.01545119 0.0155468 ]
mean value: 0.015647506713867186
key: test_mcc
value: [0.91605722 0.6992059 0.74242424 0.83743579 0.74242424 0.91666667
0.91605722 0.83971912 1. 1. ]
mean value: 0.8609990412070886
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95652174 0.82608696 0.86956522 0.91304348 0.86956522 0.95652174
0.95652174 0.91304348 1. 1. ]
mean value: 0.9260869565217391
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95238095 0.84615385 0.86956522 0.9 0.86956522 0.95652174
0.96 0.90909091 1. 1. ]
mean value: 0.9263277881538751
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.73333333 0.83333333 1. 0.90909091 1.
0.92307692 1. 1. 1. ]
mean value: 0.9398834498834499
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 1. 0.90909091 0.81818182 0.83333333 0.91666667
1. 0.83333333 1. 1. ]
mean value: 0.921969696969697
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.83333333 0.87121212 0.90909091 0.87121212 0.95833333
0.95454545 0.91666667 1. 1. ]
mean value: 0.9268939393939394
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90909091 0.73333333 0.76923077 0.81818182 0.76923077 0.91666667
0.92307692 0.83333333 1. 1. ]
mean value: 0.8672144522144523
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.02
Accuracy on Blind test: 0.49
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03209019 0.02829218 0.03167534 0.03128099 0.03293514 0.02934527
0.03381133 0.02705407 0.03058004 0.02823019]
mean value: 0.030529475212097167
key: score_time
value: [0.01726365 0.02387595 0.02088284 0.02278829 0.02153826 0.01731133
0.01608276 0.01754308 0.02704978 0.01536942]
mean value: 0.01997053623199463
key: test_mcc
value: [0.83743579 0.39393939 0.66414149 1. 0.74242424 0.91666667
0.91605722 1. 0.91287093 0.81818182]
mean value: 0.820171755257551
key: train_mcc
value: [0.98048734 0.99029034 0.99029034 0.99029126 0.98067223 0.98048734
0.99029034 0.99029034 0.96189066 0.99033794]
mean value: 0.9845328141111404
key: test_accuracy
value: [0.91304348 0.69565217 0.82608696 1. 0.86956522 0.95652174
0.95652174 1. 0.95454545 0.90909091]
mean value: 0.908102766798419
key: train_accuracy
value: [0.9902439 0.99512195 0.99512195 0.99512195 0.9902439 0.9902439
0.99512195 0.99512195 0.98058252 0.99514563]
mean value: 0.992206961875444
key: test_fscore
value: [0.9 0.69565217 0.83333333 1. 0.86956522 0.95652174
0.96 1. 0.95238095 0.90909091]
mean value: 0.9076544325239977
key: train_fscore
value: [0.99029126 0.99516908 0.99516908 0.99512195 0.99009901 0.99019608
0.99507389 0.99507389 0.98019802 0.99512195]
mean value: 0.9921514220211729
key: test_precision
value: [1. 0.66666667 0.76923077 1. 0.90909091 1.
0.92307692 1. 1. 0.90909091]
mean value: 0.9177156177156177
key: train_precision
value: [0.99029126 0.99038462 0.99038462 1. 1. 0.99019608
1. 1. 1. 1. ]
mean value: 0.9961256571336525
key: test_recall
value: [0.81818182 0.72727273 0.90909091 1. 0.83333333 0.91666667
1. 1. 0.90909091 0.90909091]
mean value: 0.9022727272727272
key: train_recall
value: [0.99029126 1. 1. 0.99029126 0.98039216 0.99019608
0.99019608 0.99019608 0.96116505 0.99029126]
mean value: 0.988301922710832
key: test_roc_auc
value: [0.90909091 0.6969697 0.82954545 1. 0.87121212 0.95833333
0.95454545 1. 0.95454545 0.90909091]
mean value: 0.9083333333333333
key: train_roc_auc
value: [0.99024367 0.99509804 0.99509804 0.99514563 0.99019608 0.99024367
0.99509804 0.99509804 0.98058252 0.99514563]
mean value: 0.992194936226918
key: test_jcc
value: [0.81818182 0.53333333 0.71428571 1. 0.76923077 0.91666667
0.92307692 1. 0.90909091 0.83333333]
mean value: 0.8417199467199468
key: train_jcc
value: [0.98076923 0.99038462 0.99038462 0.99029126 0.98039216 0.98058252
0.99019608 0.99019608 0.96116505 0.99029126]
mean value: 0.984465287235133
MCC on Blind test: 0.07
Accuracy on Blind test: 0.53
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.0418961 0.05101323 0.05104518 0.05136704 0.05105376 0.05128527
0.05111003 0.04877734 0.05096364 0.04899049]
mean value: 0.04975020885467529
key: score_time
value: [0.02236867 0.01667118 0.022753 0.02080393 0.02091503 0.02079797
0.01660824 0.01153994 0.01939178 0.01141834]
mean value: 0.018326807022094726
key: test_mcc
value: [0.12336594 0.39393939 0.56490196 0.39727608 0.74047959 0.41096386
0.58930667 0.65151515 0.48795004 0.2773501 ]
mean value: 0.46370487732579513
key: train_mcc
value: [0.95126131 0.92355447 0.90401389 0.93283198 0.93209539 0.92211753
0.91257158 0.91325992 0.90308289 0.89358299]
mean value: 0.9188371932639088
key: test_accuracy
value: [0.56521739 0.69565217 0.7826087 0.69565217 0.86956522 0.69565217
0.7826087 0.82608696 0.72727273 0.63636364]
mean value: 0.7276679841897233
key: train_accuracy
value: [0.97560976 0.96097561 0.95121951 0.96585366 0.96585366 0.96097561
0.95609756 0.95609756 0.95145631 0.94660194]
mean value: 0.9590741179256452
key: test_fscore
value: [0.44444444 0.69565217 0.76190476 0.63157895 0.88 0.66666667
0.76190476 0.83333333 0.66666667 0.6 ]
mean value: 0.69421517562021
key: train_fscore
value: [0.97584541 0.96 0.95 0.96517413 0.96517413 0.96039604
0.95522388 0.95477387 0.95098039 0.94581281]
mean value: 0.9583380658920831
key: test_precision
value: [0.57142857 0.66666667 0.8 0.75 0.84615385 0.77777778
0.88888889 0.83333333 0.85714286 0.66666667]
mean value: 0.7658058608058608
key: train_precision
value: [0.97115385 0.98969072 0.97938144 0.98979592 0.97979798 0.97
0.96969697 0.97938144 0.96039604 0.96 ]
mean value: 0.9749294361867525
key: test_recall
value: [0.36363636 0.72727273 0.72727273 0.54545455 0.91666667 0.58333333
0.66666667 0.83333333 0.54545455 0.54545455]
mean value: 0.6454545454545455
key: train_recall
value: [0.98058252 0.93203883 0.9223301 0.94174757 0.95098039 0.95098039
0.94117647 0.93137255 0.94174757 0.93203883]
mean value: 0.9424995240814773
key: test_roc_auc
value: [0.55681818 0.6969697 0.78030303 0.68939394 0.86742424 0.70075758
0.78787879 0.82575758 0.72727273 0.63636364]
mean value: 0.7268939393939393
key: train_roc_auc
value: [0.97558538 0.96111746 0.95136113 0.96597183 0.96578146 0.96092709
0.95602513 0.95597754 0.95145631 0.94660194]
mean value: 0.9590805254140491
key: test_jcc
value: [0.28571429 0.53333333 0.61538462 0.46153846 0.78571429 0.5
0.61538462 0.71428571 0.5 0.42857143]
mean value: 0.543992673992674
key: train_jcc
value: [0.95283019 0.92307692 0.9047619 0.93269231 0.93269231 0.92380952
0.91428571 0.91346154 0.90654206 0.89719626]
mean value: 0.9201348726216474
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.15189385 0.14156413 0.14368677 0.14271593 0.13952732 0.14208126
0.13928676 0.1380887 0.14191723 0.14071012]
mean value: 0.14214720726013183
key: score_time
value: [0.00953889 0.00933433 0.0094347 0.00952435 0.00953817 0.00918531
0.00900173 0.00848866 0.00957823 0.00928164]
mean value: 0.009290599822998047
key: test_mcc
value: [0.76277007 0.5164589 0.66414149 0.83743579 0.74242424 0.91666667
0.91605722 0.91666667 1. 0.81818182]
mean value: 0.8090802870032009
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86956522 0.73913043 0.82608696 0.91304348 0.86956522 0.95652174
0.95652174 0.95652174 1. 0.90909091]
mean value: 0.899604743083004
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.84210526 0.76923077 0.83333333 0.9 0.86956522 0.95652174
0.96 0.95652174 1. 0.90909091]
mean value: 0.8996368970465081
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.66666667 0.76923077 1. 0.90909091 1.
0.92307692 1. 1. 0.90909091]
mean value: 0.9177156177156177
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.72727273 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667
1. 0.91666667 1. 0.90909091]
mean value: 0.8939393939393939
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86363636 0.74621212 0.82954545 0.90909091 0.87121212 0.95833333
0.95454545 0.95833333 1. 0.90909091]
mean value: 0.9
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.72727273 0.625 0.71428571 0.81818182 0.76923077 0.91666667
0.92307692 0.91666667 1. 0.83333333]
mean value: 0.8243714618714619
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.54
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01663566 0.01202774 0.01275182 0.01185322 0.01200604 0.01168084
0.01452398 0.0116775 0.01183033 0.01218224]
mean value: 0.012716937065124511
key: score_time
value: [0.01137972 0.01086974 0.01099038 0.01081109 0.01094103 0.01092815
0.01124692 0.01074839 0.01087403 0.01108217]
mean value: 0.010987162590026855
key: test_mcc
value: [0.15096491 0.56879646 0.29359034 0.33371191 0.55048188 0.65909298
0.40451992 0.55048188 0.56694671 0.54232614]
mean value: 0.4620913131591764
key: train_mcc
value: [0.58647158 0.65859127 0.56715421 0.63490794 0.52720108 0.49387839
0.4975669 0.55024014 0.59539971 0.63353022]
mean value: 0.5744941432717103
key: test_accuracy
value: [0.56521739 0.73913043 0.60869565 0.65217391 0.73913043 0.82608696
0.65217391 0.73913043 0.77272727 0.72727273]
mean value: 0.7021739130434782
key: train_accuracy
value: [0.76097561 0.8097561 0.76585366 0.79512195 0.72195122 0.73170732
0.69756098 0.73170732 0.77669903 0.78640777]
mean value: 0.7577740942457968
key: test_fscore
value: [0.61538462 0.78571429 0.68965517 0.69230769 0.8 0.84615385
0.75 0.8 0.8 0.78571429]
mean value: 0.7564929897688518
key: train_fscore
value: [0.80632411 0.83817427 0.80165289 0.82786885 0.77992278 0.76987448
0.76691729 0.78764479 0.81147541 0.824 ]
mean value: 0.8013854877176021
key: test_precision
value: [0.53333333 0.64705882 0.55555556 0.6 0.66666667 0.78571429
0.6 0.66666667 0.71428571 0.64705882]
mean value: 0.6416339869281046
key: train_precision
value: [0.68 0.73188406 0.69784173 0.71631206 0.6433121 0.67153285
0.62195122 0.64968153 0.70212766 0.70068027]
mean value: 0.6815323469811392
key: test_recall
value: [0.72727273 1. 0.90909091 0.81818182 1. 0.91666667
1. 1. 0.90909091 1. ]
mean value: 0.928030303030303
key: train_recall
value: [0.99029126 0.98058252 0.94174757 0.98058252 0.99019608 0.90196078
1. 1. 0.96116505 1. ]
mean value: 0.9746525794783933
key: test_roc_auc
value: [0.5719697 0.75 0.62121212 0.65909091 0.72727273 0.8219697
0.63636364 0.72727273 0.77272727 0.72727273]
mean value: 0.7015151515151515
key: train_roc_auc
value: [0.75985151 0.80891871 0.76499143 0.79421283 0.72325338 0.73253379
0.69902913 0.73300971 0.77669903 0.78640777]
mean value: 0.7578907291071768
key: test_jcc
value: [0.44444444 0.64705882 0.52631579 0.52941176 0.66666667 0.73333333
0.6 0.66666667 0.66666667 0.64705882]
mean value: 0.6127622979016167
key: train_jcc
value: [0.67549669 0.72142857 0.66896552 0.70629371 0.63924051 0.62585034
0.62195122 0.64968153 0.68275862 0.70068027]
mean value: 0.6692346971143661
MCC on Blind test: 0.36
Accuracy on Blind test: 0.61
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01373434 0.01035643 0.01028872 0.01046038 0.01034403 0.01036429
0.01039052 0.01046538 0.01036501 0.01036739]
mean value: 0.010713648796081544
key: score_time
value: [0.01092839 0.01048851 0.01048732 0.01047397 0.01050997 0.01047373
0.01039219 0.0105195 0.01046062 0.01039433]
mean value: 0.010512852668762207
key: test_mcc
value: [0.62050523 0.74242424 0.47727273 0.91605722 0.66414149 0.82575758
0.82575758 0.83971912 0.91287093 0.73029674]
mean value: 0.7554802857310763
key: train_mcc
value: [0.87320324 0.85404174 0.8742382 0.82504775 0.87320324 0.83447633
0.86356283 0.85368872 0.86424061 0.86424061]
mean value: 0.8579943273085286
key: test_accuracy
value: [0.7826087 0.86956522 0.73913043 0.95652174 0.82608696 0.91304348
0.91304348 0.91304348 0.95454545 0.86363636]
mean value: 0.8731225296442687
key: train_accuracy
value: [0.93658537 0.92682927 0.93658537 0.91219512 0.93658537 0.91707317
0.93170732 0.92682927 0.93203883 0.93203883]
mean value: 0.9288467913805352
key: test_fscore
value: [0.70588235 0.86956522 0.72727273 0.95238095 0.81818182 0.91666667
0.91666667 0.90909091 0.95238095 0.85714286]
mean value: 0.862523112011603
key: train_fscore
value: [0.93719807 0.92610837 0.93532338 0.91089109 0.93596059 0.91542289
0.93069307 0.92610837 0.93137255 0.93137255]
mean value: 0.9280450932646102
key: test_precision
value: [1. 0.83333333 0.72727273 1. 0.9 0.91666667
0.91666667 1. 1. 0.9 ]
mean value: 0.9193939393939394
key: train_precision
value: [0.93269231 0.94 0.95918367 0.92929293 0.94059406 0.92929293
0.94 0.93069307 0.94059406 0.94059406]
mean value: 0.9382937087272306
key: test_recall
value: [0.54545455 0.90909091 0.72727273 0.90909091 0.75 0.91666667
0.91666667 0.83333333 0.90909091 0.81818182]
mean value: 0.8234848484848485
key: train_recall
value: [0.94174757 0.91262136 0.91262136 0.89320388 0.93137255 0.90196078
0.92156863 0.92156863 0.9223301 0.9223301 ]
mean value: 0.9181324957167333
key: test_roc_auc
value: [0.77272727 0.87121212 0.73863636 0.95454545 0.82954545 0.91287879
0.91287879 0.91666667 0.95454545 0.86363636]
mean value: 0.8727272727272727
key: train_roc_auc
value: [0.93656006 0.92689891 0.93670284 0.91228822 0.93656006 0.91699981
0.9316581 0.92680373 0.93203883 0.93203883]
mean value: 0.9288549400342662
key: test_jcc
value: [0.54545455 0.76923077 0.57142857 0.90909091 0.69230769 0.84615385
0.84615385 0.83333333 0.90909091 0.75 ]
mean value: 0.7672244422244422
key: train_jcc
value: [0.88181818 0.86238532 0.87850467 0.83636364 0.87962963 0.8440367
0.87037037 0.86238532 0.87155963 0.87155963]
mean value: 0.8658613096583602
MCC on Blind test: 0.21
Accuracy on Blind test: 0.6
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.08771014 0.08218527 0.08186293 0.08213329 0.14164805 0.1171093
0.08225846 0.0825932 0.08844352 0.08227658]
mean value: 0.09282207489013672
key: score_time
value: [0.01075149 0.01060557 0.01062846 0.01066661 0.01069403 0.01066709
0.01072025 0.01071143 0.01065779 0.01065183]
mean value: 0.010675454139709472
key: test_mcc
value: [0.69084928 0.74242424 0.39393939 1. 0.66414149 0.91666667
0.65909298 0.91666667 0.91287093 0.91287093]
mean value: 0.7809522578121664
key: train_mcc
value: [0.87321531 0.85404174 0.90261781 0.85404174 0.87320324 0.88292404
0.88308106 0.86341138 0.87382759 0.8544092 ]
mean value: 0.8714773106645599
key: test_accuracy
value: [0.82608696 0.86956522 0.69565217 1. 0.82608696 0.95652174
0.82608696 0.95652174 0.95454545 0.95454545]
mean value: 0.8865612648221344
key: train_accuracy
value: [0.93658537 0.92682927 0.95121951 0.92682927 0.93658537 0.94146341
0.94146341 0.93170732 0.9368932 0.92718447]
mean value: 0.9356760596732181
key: test_fscore
value: [0.77777778 0.86956522 0.69565217 1. 0.81818182 0.95652174
0.84615385 0.95652174 0.95238095 0.95652174]
mean value: 0.8829277003190047
key: train_fscore
value: [0.93658537 0.92610837 0.95098039 0.92610837 0.93596059 0.94117647
0.94059406 0.93137255 0.93658537 0.92682927]
mean value: 0.9352300811072124
key: test_precision
value: [1. 0.83333333 0.66666667 1. 0.9 1.
0.78571429 1. 1. 0.91666667]
mean value: 0.9102380952380953
key: train_precision
value: [0.94117647 0.94 0.96039604 0.94 0.94059406 0.94117647
0.95 0.93137255 0.94117647 0.93137255]
mean value: 0.9417264608813822
key: test_recall
value: [0.63636364 0.90909091 0.72727273 1. 0.75 0.91666667
0.91666667 0.91666667 0.90909091 1. ]
mean value: 0.8681818181818182
key: train_recall
value: [0.93203883 0.91262136 0.94174757 0.91262136 0.93137255 0.94117647
0.93137255 0.93137255 0.93203883 0.9223301 ]
mean value: 0.9288692175899487
key: test_roc_auc
value: [0.81818182 0.87121212 0.6969697 1. 0.82954545 0.95833333
0.8219697 0.95833333 0.95454545 0.95454545]
mean value: 0.8863636363636364
key: train_roc_auc
value: [0.93660765 0.92689891 0.95126594 0.92689891 0.93656006 0.94146202
0.94141443 0.93170569 0.9368932 0.92718447]
mean value: 0.9356891300209405
key: test_jcc
value: [0.63636364 0.76923077 0.53333333 1. 0.69230769 0.91666667
0.73333333 0.91666667 0.90909091 0.91666667]
mean value: 0.8023659673659673
key: train_jcc
value: [0.88073394 0.86238532 0.90654206 0.86238532 0.87962963 0.88888889
0.88785047 0.87155963 0.88073394 0.86363636]
mean value: 0.8784345570656983
MCC on Blind test: 0.09
Accuracy on Blind test: 0.54
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.01983237 0.0227282 0.0234642 0.02106881 0.01955771 0.02060318
0.02057695 0.02065086 0.02190328 0.02053022]
mean value: 0.021091580390930176
key: score_time
value: [0.01070762 0.01104808 0.01071906 0.01067495 0.01204062 0.0107677
0.01073122 0.01068878 0.01067734 0.01063418]
mean value: 0.0108689546585083
key: test_mcc
value: [0.58002308 0.48856385 0.23262105 0.65909298 0.65909298 0.83971912
0.91605722 0.82575758 1. 0.27272727]
mean value: 0.6473655141770323
key: train_mcc
value: [0.78548989 0.77565201 0.83417421 0.74754561 0.77565201 0.75613935
0.76601619 0.77565201 0.77673564 0.78640777]
mean value: 0.7779464673314072
key: test_accuracy
value: [0.7826087 0.73913043 0.60869565 0.82608696 0.82608696 0.91304348
0.95652174 0.91304348 1. 0.63636364]
mean value: 0.8201581027667985
key: train_accuracy
value: [0.89268293 0.88780488 0.91707317 0.87317073 0.88780488 0.87804878
0.88292683 0.88780488 0.88834951 0.89320388]
mean value: 0.8888870471228985
key: test_fscore
value: [0.73684211 0.75 0.64 0.8 0.84615385 0.90909091
0.96 0.91666667 1. 0.63636364]
mean value: 0.8195117163538216
key: train_fscore
value: [0.89423077 0.88780488 0.9178744 0.87735849 0.88780488 0.87804878
0.88349515 0.88780488 0.88888889 0.89320388]
mean value: 0.8896514988581322
key: test_precision
value: [0.875 0.69230769 0.57142857 0.88888889 0.78571429 1.
0.92307692 0.91666667 1. 0.63636364]
mean value: 0.8289446664446665
key: train_precision
value: [0.88571429 0.89215686 0.91346154 0.85321101 0.88349515 0.87378641
0.875 0.88349515 0.88461538 0.89320388]
mean value: 0.8838139663234891
key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.72727273 0.91666667 0.83333333
1. 0.91666667 1. 0.63636364]
mean value: 0.8212121212121212
key: train_recall
value: [0.90291262 0.88349515 0.9223301 0.90291262 0.89215686 0.88235294
0.89215686 0.89215686 0.89320388 0.89320388]
mean value: 0.895688178183895
key: test_roc_auc
value: [0.77651515 0.74242424 0.61363636 0.8219697 0.8219697 0.91666667
0.95454545 0.91287879 1. 0.63636364]
mean value: 0.8196969696969697
key: train_roc_auc
value: [0.89263278 0.887826 0.9170474 0.87302494 0.887826 0.87806967
0.88297164 0.887826 0.88834951 0.89320388]
mean value: 0.8888777841233582
key: test_jcc
value: [0.58333333 0.6 0.47058824 0.66666667 0.73333333 0.83333333
0.92307692 0.84615385 1. 0.46666667]
mean value: 0.7123152337858221
key: train_jcc
value: [0.80869565 0.79824561 0.84821429 0.78151261 0.79824561 0.7826087
0.79130435 0.79824561 0.8 0.80701754]
mean value: 0.8014089972373388
MCC on Blind test: 0.35
Accuracy on Blind test: 0.67
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.63725948 0.8160131 0.64619517 0.67831397 0.81363416 0.64932156
0.64309788 0.77185845 0.69442797 0.65856528]
mean value: 0.7008687019348144
key: score_time
value: [0.01391268 0.01416373 0.01514912 0.01428485 0.01425004 0.01425481
0.01451612 0.01446199 0.01425672 0.01421475]
mean value: 0.014346480369567871
key: test_mcc
value: [0.58002308 0.74242424 0.58930667 0.74047959 0.74242424 0.82575758
0.74242424 0.58930667 0.75592895 0.75592895]
mean value: 0.7064004193432867
key: train_mcc
value: [1. 0.99029034 0.93174679 1. 0.96116136 0.93174679
0.87355997 0.92194936 0.99033794 0.97091955]
mean value: 0.9571712098991773
key: test_accuracy
value: [0.7826087 0.86956522 0.7826087 0.86956522 0.86956522 0.91304348
0.86956522 0.7826087 0.86363636 0.86363636]
mean value: 0.8466403162055336
key: train_accuracy
value: [1. 0.99512195 0.96585366 1. 0.9804878 0.96585366
0.93658537 0.96097561 0.99514563 0.98543689]
mean value: 0.9785460573052333
key: test_fscore
value: [0.73684211 0.86956522 0.8 0.85714286 0.86956522 0.91666667
0.86956522 0.76190476 0.84210526 0.88 ]
mean value: 0.8403357306309251
key: train_fscore
value: [1. 0.99516908 0.96618357 1. 0.98058252 0.96551724
0.93719807 0.96078431 0.99512195 0.98550725]
mean value: 0.978606400161065
key: test_precision
value: [0.875 0.83333333 0.71428571 0.9 0.90909091 0.91666667
0.90909091 0.88888889 1. 0.78571429]
mean value: 0.8732070707070707
key: train_precision
value: [1. 0.99038462 0.96153846 1. 0.97115385 0.97029703
0.92380952 0.96078431 1. 0.98076923]
mean value: 0.9758737021084138
key: test_recall
value: [0.63636364 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667
0.83333333 0.66666667 0.72727273 1. ]
mean value: 0.825
key: train_recall
value: [1. 1. 0.97087379 1. 0.99019608 0.96078431
0.95098039 0.96078431 0.99029126 0.99029126]
mean value: 0.9814201408718828
key: test_roc_auc
value: [0.77651515 0.87121212 0.78787879 0.86742424 0.87121212 0.91287879
0.87121212 0.78787879 0.86363636 0.86363636]
mean value: 0.8473484848484848
key: train_roc_auc
value: [1. 0.99509804 0.96582905 1. 0.98053493 0.96582905
0.93665524 0.96097468 0.99514563 0.98543689]
mean value: 0.9785503521797069
key: test_jcc
value: [0.58333333 0.76923077 0.66666667 0.75 0.76923077 0.84615385
0.76923077 0.61538462 0.72727273 0.78571429]
mean value: 0.7282217782217782
key: train_jcc
value: [1. 0.99038462 0.93457944 1. 0.96190476 0.93333333
0.88181818 0.9245283 0.99029126 0.97142857]
mean value: 0.9588268467144515
MCC on Blind test: 0.08
Accuracy on Blind test: 0.54
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00974512 0.00931644 0.00723195 0.00713229 0.00715876 0.00732064
0.00752759 0.00799894 0.00718284 0.00744987]
mean value: 0.00780644416809082
key: score_time
value: [0.01071715 0.00873137 0.00839925 0.0081389 0.008322 0.00864697
0.00856805 0.0087285 0.00848913 0.00818849]
mean value: 0.00869297981262207
key: test_mcc
value: [0.2096648 0.56879646 0.29359034 0.31298622 0.32232919 0.65151515
0.40451992 0.01343038 0.54232614 0.29277002]
mean value: 0.36119286228684955
key: train_mcc
value: [0.42185455 0.44881052 0.49019032 0.45523737 0.44991626 0.44991626
0.4598332 0.51034181 0.41615085 0.43864549]
mean value: 0.45408966367968817
key: test_accuracy
value: [0.56521739 0.73913043 0.60869565 0.65217391 0.60869565 0.82608696
0.65217391 0.52173913 0.72727273 0.63636364]
mean value: 0.6537549407114625
key: train_accuracy
value: [0.65853659 0.69268293 0.71707317 0.69268293 0.68780488 0.68780488
0.69756098 0.73658537 0.67961165 0.68932039]
mean value: 0.6939663746152025
key: test_fscore
value: [0.66666667 0.78571429 0.68965517 0.66666667 0.72727273 0.83333333
0.75 0.66666667 0.78571429 0.69230769]
mean value: 0.7263997496756118
key: train_fscore
value: [0.74452555 0.75675676 0.77165354 0.75862069 0.75384615 0.75384615
0.7578125 0.7768595 0.74418605 0.75193798]
mean value: 0.7570044879996563
key: test_precision
value: [0.52631579 0.64705882 0.55555556 0.61538462 0.57142857 0.83333333
0.6 0.52380952 0.64705882 0.6 ]
mean value: 0.6119945036044108
key: train_precision
value: [0.59649123 0.62820513 0.64900662 0.62658228 0.62025316 0.62025316
0.62987013 0.67142857 0.61935484 0.62580645]
mean value: 0.6287251578008078
key: test_recall
value: [0.90909091 1. 0.90909091 0.72727273 1. 0.83333333
1. 0.91666667 1. 0.81818182]
mean value: 0.9113636363636364
key: train_recall
value: [0.99029126 0.95145631 0.95145631 0.96116505 0.96078431 0.96078431
0.95098039 0.92156863 0.93203883 0.94174757]
mean value: 0.9522272986864648
key: test_roc_auc
value: [0.57954545 0.75 0.62121212 0.65530303 0.59090909 0.82575758
0.63636364 0.50378788 0.72727273 0.63636364]
mean value: 0.6526515151515152
key: train_roc_auc
value: [0.65691034 0.69141443 0.71592423 0.69136684 0.68913002 0.68913002
0.69879117 0.73748334 0.67961165 0.68932039]
mean value: 0.693908242908814
key: test_jcc
value: [0.5 0.64705882 0.52631579 0.5 0.57142857 0.71428571
0.6 0.5 0.64705882 0.52941176]
mean value: 0.5735559486952676
key: train_jcc
value: [0.59302326 0.60869565 0.62820513 0.61111111 0.60493827 0.60493827
0.61006289 0.63513514 0.59259259 0.60248447]
mean value: 0.609118678337316
MCC on Blind test: 0.48
Accuracy on Blind test: 0.71
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0070138 0.00690508 0.00694633 0.00693202 0.00696373 0.00683618
0.00693846 0.0069778 0.0068872 0.00699329]
mean value: 0.006939387321472168
key: score_time
value: [0.00785279 0.00785279 0.00780988 0.00785208 0.0078671 0.00781655
0.00785685 0.00782061 0.0078702 0.00784469]
mean value: 0.007844352722167968
key: test_mcc
value: [ 0.39393939 0.06579517 -0.03816905 0.38932432 0.33946383 0.56490196
0.33946383 0.21452908 0.54772256 0.36514837]
mean value: 0.318211945518085
key: train_mcc
value: [0.39749865 0.36390677 0.37171873 0.369368 0.40852696 0.36225341
0.37286188 0.38354703 0.38043802 0.34401398]
mean value: 0.37541334377025193
key: test_accuracy
value: [0.69565217 0.52173913 0.47826087 0.69565217 0.65217391 0.7826087
0.65217391 0.60869565 0.77272727 0.68181818]
mean value: 0.6541501976284585
key: train_accuracy
value: [0.69756098 0.67804878 0.68292683 0.68292683 0.70243902 0.67804878
0.68292683 0.68780488 0.68932039 0.66504854]
mean value: 0.6847051858868103
key: test_fscore
value: [0.69565217 0.59259259 0.5 0.66666667 0.73333333 0.8
0.73333333 0.66666667 0.7826087 0.66666667]
mean value: 0.6837520128824477
key: train_fscore
value: [0.71559633 0.71052632 0.71111111 0.70588235 0.71889401 0.7027027
0.70852018 0.71428571 0.7037037 0.70638298]
mean value: 0.7097605398121303
key: test_precision
value: [0.66666667 0.5 0.46153846 0.7 0.61111111 0.76923077
0.61111111 0.6 0.75 0.7 ]
mean value: 0.6369658119658119
key: train_precision
value: [0.67826087 0.648 0.6557377 0.66101695 0.67826087 0.65
0.65289256 0.6557377 0.67256637 0.62878788]
mean value: 0.6581260910571809
key: test_recall
value: [0.72727273 0.72727273 0.54545455 0.63636364 0.91666667 0.83333333
0.91666667 0.75 0.81818182 0.63636364]
mean value: 0.7507575757575757
key: train_recall
value: [0.75728155 0.78640777 0.77669903 0.75728155 0.76470588 0.76470588
0.7745098 0.78431373 0.73786408 0.80582524]
mean value: 0.7709594517418618
key: test_roc_auc
value: [0.6969697 0.53030303 0.48106061 0.69318182 0.64015152 0.78030303
0.64015152 0.60227273 0.77272727 0.68181818]
mean value: 0.6518939393939394
key: train_roc_auc
value: [0.69726823 0.67751761 0.68246716 0.68256235 0.70274129 0.67846945
0.68337141 0.68827337 0.68932039 0.66504854]
mean value: 0.6847039786788501
key: test_jcc
value: [0.53333333 0.42105263 0.33333333 0.5 0.57894737 0.66666667
0.57894737 0.5 0.64285714 0.5 ]
mean value: 0.5255137844611529
key: train_jcc
value: [0.55714286 0.55102041 0.55172414 0.54545455 0.56115108 0.54166667
0.54861111 0.55555556 0.54285714 0.54605263]
mean value: 0.5501236135597817
MCC on Blind test: 0.47
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00674987 0.00651455 0.00655532 0.00653768 0.0065999 0.007195
0.00682569 0.00735092 0.00722623 0.0072701 ]
mean value: 0.006882524490356446
key: score_time
value: [0.01375031 0.0088625 0.00889778 0.01025677 0.00893879 0.00916982
0.00883722 0.00970483 0.00950575 0.00965738]
mean value: 0.0097581148147583
key: test_mcc
value: [0.21452908 0.39393939 0.33371191 0.48075018 0.39727608 0.12878788
0.25495628 0.30240737 0.48795004 0.18898224]
mean value: 0.31832904389236555
key: train_mcc
value: [0.63902904 0.59060621 0.60982579 0.63382493 0.67133261 0.60982579
0.68889027 0.67805807 0.59504408 0.6617241 ]
mean value: 0.6378160892933147
key: test_accuracy
value: [0.60869565 0.69565217 0.65217391 0.73913043 0.69565217 0.56521739
0.60869565 0.65217391 0.72727273 0.59090909]
mean value: 0.6535573122529644
key: train_accuracy
value: [0.8195122 0.79512195 0.80487805 0.81463415 0.83414634 0.80487805
0.84390244 0.83902439 0.7961165 0.83009709]
mean value: 0.8182311153208619
key: test_fscore
value: [0.52631579 0.69565217 0.69230769 0.7 0.74074074 0.58333333
0.52631579 0.69230769 0.66666667 0.52631579]
mean value: 0.6349955667690221
key: train_fscore
value: [0.82125604 0.8 0.80769231 0.80412371 0.82474227 0.8019802
0.83838384 0.83743842 0.78571429 0.83568075]
mean value: 0.8157011822658049
key: test_precision
value: [0.625 0.66666667 0.6 0.77777778 0.66666667 0.58333333
0.71428571 0.64285714 0.85714286 0.625 ]
mean value: 0.6758730158730158
key: train_precision
value: [0.81730769 0.78504673 0.8 0.85714286 0.86956522 0.81
0.86458333 0.84158416 0.82795699 0.80909091]
mean value: 0.8282277885901212
key: test_recall
value: [0.45454545 0.72727273 0.81818182 0.63636364 0.83333333 0.58333333
0.41666667 0.75 0.54545455 0.45454545]
mean value: 0.621969696969697
key: train_recall
value: [0.82524272 0.81553398 0.81553398 0.75728155 0.78431373 0.79411765
0.81372549 0.83333333 0.74757282 0.86407767]
mean value: 0.8050732914525033
key: test_roc_auc
value: [0.60227273 0.6969697 0.65909091 0.73484848 0.68939394 0.56439394
0.61742424 0.64772727 0.72727273 0.59090909]
mean value: 0.6530303030303031
key: train_roc_auc
value: [0.8194841 0.79502189 0.80482581 0.81491529 0.83390444 0.80482581
0.84375595 0.83899676 0.7961165 0.83009709]
mean value: 0.8181943651246907
key: test_jcc
value: [0.35714286 0.53333333 0.52941176 0.53846154 0.58823529 0.41176471
0.35714286 0.52941176 0.5 0.35714286]
mean value: 0.47020469726352077
key: train_jcc
value: [0.69672131 0.66666667 0.67741935 0.67241379 0.70175439 0.66942149
0.72173913 0.72033898 0.64705882 0.71774194]
mean value: 0.6891275872151366
MCC on Blind test: 0.3
Accuracy on Blind test: 0.65
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00949764 0.00926423 0.00878119 0.00982475 0.00883627 0.00911903
0.00922585 0.00873303 0.00967121 0.00942421]
mean value: 0.00923774242401123
key: score_time
value: [0.00828195 0.00830245 0.00834298 0.00822258 0.00852466 0.00888777
0.00881886 0.00821614 0.00887179 0.00847054]
mean value: 0.008493971824645997
key: test_mcc
value: [0.47727273 0.48856385 0.3030303 0.48075018 0.58002308 0.76764947
0.74047959 0.58002308 0.68313005 0.09090909]
mean value: 0.5191831414788004
key: train_mcc
value: [0.76709739 0.73662669 0.77590489 0.75693529 0.75611614 0.69845687
0.70790488 0.71798813 0.73789886 0.74813718]
mean value: 0.7403066312362871
key: test_accuracy
value: [0.73913043 0.73913043 0.65217391 0.73913043 0.7826087 0.86956522
0.86956522 0.7826087 0.81818182 0.54545455]
mean value: 0.7537549407114624
key: train_accuracy
value: [0.88292683 0.86829268 0.88780488 0.87804878 0.87804878 0.84878049
0.85365854 0.85853659 0.86893204 0.87378641]
mean value: 0.8698816007577551
key: test_fscore
value: [0.72727273 0.75 0.63636364 0.7 0.81481481 0.85714286
0.88 0.81481481 0.77777778 0.54545455]
mean value: 0.7503641173641173
key: train_fscore
value: [0.88679245 0.86829268 0.88995215 0.88151659 0.87684729 0.85167464
0.85576923 0.86124402 0.86829268 0.87619048]
mean value: 0.8716572217358802
key: test_precision
value: [0.72727273 0.69230769 0.63636364 0.77777778 0.73333333 1.
0.84615385 0.73333333 1. 0.54545455]
mean value: 0.7691996891996892
key: train_precision
value: [0.86238532 0.87254902 0.87735849 0.86111111 0.88118812 0.8317757
0.83962264 0.8411215 0.87254902 0.85981308]
mean value: 0.8599474002688899
key: test_recall
value: [0.72727273 0.81818182 0.63636364 0.63636364 0.91666667 0.75
0.91666667 0.91666667 0.63636364 0.54545455]
mean value: 0.75
key: train_recall
value: [0.91262136 0.86407767 0.90291262 0.90291262 0.87254902 0.87254902
0.87254902 0.88235294 0.86407767 0.89320388]
mean value: 0.8839805825242718
key: test_roc_auc
value: [0.73863636 0.74242424 0.65151515 0.73484848 0.77651515 0.875
0.86742424 0.77651515 0.81818182 0.54545455]
mean value: 0.7526515151515151
key: train_roc_auc
value: [0.88278127 0.86831334 0.88773082 0.8779269 0.87802208 0.84889587
0.85375024 0.8586522 0.86893204 0.87378641]
mean value: 0.8698791166952218
key: test_jcc
value: [0.57142857 0.6 0.46666667 0.53846154 0.6875 0.75
0.78571429 0.6875 0.63636364 0.375 ]
mean value: 0.6098634698634698
key: train_jcc
value: [0.79661017 0.76724138 0.80172414 0.78813559 0.78070175 0.74166667
0.74789916 0.75630252 0.76724138 0.77966102]
mean value: 0.7727183777937642
MCC on Blind test: 0.45
Accuracy on Blind test: 0.72
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.60935068 0.83438659 0.7129097 0.18024588 0.55924106 0.85111785
0.75755429 0.51584601 0.85538912 0.78311491]
mean value: 0.6659156084060669
key: score_time
value: [0.01095915 0.01519394 0.01093698 0.0109117 0.01094437 0.01403475
0.02076578 0.01094198 0.01319242 0.01348639]
mean value: 0.013136744499206543
key: test_mcc
value: [0.62050523 0.58930667 0.21969697 0.69084928 0.65151515 0.82575758
0.83743579 0.50168817 0.91287093 0.48795004]
mean value: 0.6337575793672167
key: train_mcc
value: [0.88361919 0.86356283 0.91435567 0.5161037 0.84404459 0.8742382
0.88447331 0.78922439 0.88349515 0.86407767]
mean value: 0.831719470643712
key: test_accuracy
value: [0.7826087 0.7826087 0.60869565 0.82608696 0.82608696 0.91304348
0.91304348 0.73913043 0.95454545 0.72727273]
mean value: 0.8073122529644269
key: train_accuracy
value: [0.94146341 0.93170732 0.95609756 0.75121951 0.92195122 0.93658537
0.94146341 0.89268293 0.94174757 0.93203883]
mean value: 0.9146957139474308
key: test_fscore
value: [0.70588235 0.8 0.60869565 0.77777778 0.83333333 0.91666667
0.92307692 0.78571429 0.95238095 0.66666667]
mean value: 0.7970194610731695
key: train_fscore
value: [0.94059406 0.93269231 0.95477387 0.72131148 0.92079208 0.93779904
0.94285714 0.89719626 0.94174757 0.93203883]
mean value: 0.9121802646431316
key: test_precision
value: [1. 0.71428571 0.58333333 1. 0.83333333 0.91666667
0.85714286 0.6875 1. 0.85714286]
mean value: 0.8449404761904762
key: train_precision
value: [0.95959596 0.92380952 0.98958333 0.825 0.93 0.91588785
0.91666667 0.85714286 0.94174757 0.93203883]
mean value: 0.9191472598782621
key: test_recall
value: [0.54545455 0.90909091 0.63636364 0.63636364 0.83333333 0.91666667
1. 0.91666667 0.90909091 0.54545455]
mean value: 0.7848484848484848
key: train_recall
value: [0.9223301 0.94174757 0.9223301 0.6407767 0.91176471 0.96078431
0.97058824 0.94117647 0.94174757 0.93203883]
mean value: 0.9085284599276604
key: test_roc_auc
value: [0.77272727 0.78787879 0.60984848 0.81818182 0.82575758 0.91287879
0.90909091 0.73106061 0.95454545 0.72727273]
mean value: 0.8049242424242424
key: train_roc_auc
value: [0.94155721 0.9316581 0.95626309 0.7517609 0.92190177 0.93670284
0.9416048 0.89291833 0.94174757 0.93203883]
mean value: 0.9148153436131735
key: test_jcc
value: [0.54545455 0.66666667 0.4375 0.63636364 0.71428571 0.84615385
0.85714286 0.64705882 0.90909091 0.5 ]
mean value: 0.6759716998687587
key: train_jcc
value: [0.88785047 0.87387387 0.91346154 0.56410256 0.85321101 0.88288288
0.89189189 0.81355932 0.88990826 0.87272727]
mean value: 0.8443469079318687
MCC on Blind test: 0.32
Accuracy on Blind test: 0.66
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01073289 0.01022744 0.00823569 0.00801015 0.00778437 0.00778627
0.0077095 0.00775599 0.00761223 0.00784159]
mean value: 0.00836961269378662
key: score_time
value: [0.01050305 0.00813007 0.00807691 0.00800776 0.00780082 0.00781107
0.00772762 0.00769615 0.00769758 0.00771093]
mean value: 0.008116197586059571
key: test_mcc
value: [0.91666667 0.58930667 0.76277007 0.83743579 0.82575758 0.83971912
1. 0.91666667 0.81818182 0.73029674]
mean value: 0.8236801120713376
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95652174 0.7826087 0.86956522 0.91304348 0.91304348 0.91304348
1. 0.95652174 0.90909091 0.86363636]
mean value: 0.9077075098814229
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95652174 0.8 0.84210526 0.9 0.91666667 0.90909091
1. 0.95652174 0.90909091 0.85714286]
mean value: 0.9047140083410107
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91666667 0.71428571 1. 1. 0.91666667 1.
1. 1. 0.90909091 0.9 ]
mean value: 0.9356709956709957
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.90909091 0.72727273 0.81818182 0.91666667 0.83333333
1. 0.91666667 0.90909091 0.81818182]
mean value: 0.8848484848484849
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95833333 0.78787879 0.86363636 0.90909091 0.91287879 0.91666667
1. 0.95833333 0.90909091 0.86363636]
mean value: 0.9079545454545455
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.91666667 0.66666667 0.72727273 0.81818182 0.84615385 0.83333333
1. 0.91666667 0.83333333 0.75 ]
mean value: 0.8308275058275059
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.04
Accuracy on Blind test: 0.51
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.0877099 0.08413386 0.08387136 0.08425379 0.08437729 0.0846684
0.08445048 0.08430481 0.08534908 0.08467293]
mean value: 0.08477919101715088
key: score_time
value: [0.01659155 0.01651287 0.01647377 0.01637912 0.01636243 0.01650667
0.01641607 0.01648188 0.01631093 0.01695395]
mean value: 0.016498923301696777
key: test_mcc
value: [0.48075018 0.76764947 0.66414149 0.91605722 0.74047959 1.
1. 1. 0.81818182 0.81818182]
mean value: 0.8205441588214263
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73913043 0.86956522 0.82608696 0.95652174 0.86956522 1.
1. 1. 0.90909091 0.90909091]
mean value: 0.9079051383399209
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7 0.88 0.83333333 0.95238095 0.88 1.
1. 1. 0.90909091 0.90909091]
mean value: 0.9063896103896104
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.78571429 0.76923077 1. 0.84615385 1.
1. 1. 0.90909091 0.90909091]
mean value: 0.8997058497058497
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.63636364 1. 0.90909091 0.90909091 0.91666667 1.
1. 1. 0.90909091 0.90909091]
mean value: 0.918939393939394
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73484848 0.875 0.82954545 0.95454545 0.86742424 1.
1. 1. 0.90909091 0.90909091]
mean value: 0.9079545454545455
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.53846154 0.78571429 0.71428571 0.90909091 0.78571429 1.
1. 1. 0.83333333 0.83333333]
mean value: 0.83999333999334
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.63
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00697589 0.00684834 0.00684047 0.00689197 0.00688601 0.00683427
0.00684476 0.00691366 0.00708032 0.00721502]
mean value: 0.006933069229125977
key: score_time
value: [0.00782681 0.00778484 0.00773478 0.00783086 0.0077734 0.00773144
0.00774217 0.00777936 0.00781941 0.00777721]
mean value: 0.007780027389526367
key: test_mcc
value: [0.39393939 0.66414149 0.03816905 0.50168817 0.47727273 0.44411739
0.83971912 0.31252706 0.68313005 0.36514837]
mean value: 0.4719852822351723
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.69565217 0.82608696 0.52173913 0.73913043 0.73913043 0.69565217
0.91304348 0.65217391 0.81818182 0.68181818]
mean value: 0.7282608695652174
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.69565217 0.83333333 0.47619048 0.66666667 0.75 0.63157895
0.90909091 0.71428571 0.77777778 0.69565217]
mean value: 0.7150228172539385
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.76923077 0.5 0.85714286 0.75 0.85714286
1. 0.625 1. 0.66666667]
mean value: 0.7691849816849816
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.72727273 0.90909091 0.45454545 0.54545455 0.75 0.5
0.83333333 0.83333333 0.63636364 0.72727273]
mean value: 0.6916666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.6969697 0.82954545 0.51893939 0.73106061 0.73863636 0.70454545
0.91666667 0.64393939 0.81818182 0.68181818]
mean value: 0.728030303030303
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.53333333 0.71428571 0.3125 0.5 0.6 0.46153846
0.83333333 0.55555556 0.63636364 0.53333333]
mean value: 0.5680243367743367
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.24
Accuracy on Blind test: 0.62
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.09106922 1.09859252 1.08210254 1.08648729 1.08477712 1.10239196
1.09634829 1.08665371 1.15152287 1.16227579]
mean value: 1.1042221307754516
key: score_time
value: [0.09068799 0.14433622 0.09434557 0.09718037 0.09297609 0.09529161
0.09361553 0.08816409 0.09734464 0.0969758 ]
mean value: 0.09909179210662841
key: test_mcc
value: [0.74047959 0.6992059 0.66414149 1. 0.91605722 0.91666667
1. 1. 0.91287093 0.91287093]
mean value: 0.8762292726696667
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86956522 0.82608696 0.82608696 1. 0.95652174 0.95652174
1. 1. 0.95454545 0.95454545]
mean value: 0.9343873517786562
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.84615385 0.83333333 1. 0.96 0.95652174
1. 1. 0.95652174 0.95652174]
mean value: 0.9366195254021341
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.73333333 0.76923077 1. 0.92307692 1.
1. 1. 0.91666667 0.91666667]
mean value: 0.9158974358974359
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 1. 0.90909091 1. 1. 0.91666667
1. 1. 1. 1. ]
mean value: 0.9643939393939394
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86742424 0.83333333 0.82954545 1. 0.95454545 0.95833333
1. 1. 0.95454545 0.95454545]
mean value: 0.9352272727272727
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.73333333 0.71428571 1. 0.92307692 0.91666667
1. 1. 0.91666667 0.91666667]
mean value: 0.8870695970695971
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.14
Accuracy on Blind test: 0.55
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.84751892 0.83743048 0.99197769 0.88181758 0.90272188 0.897192
0.83777547 0.89692569 0.9466176 0.89896107]
mean value: 0.893893837928772
key: score_time
value: [0.18853641 0.20011663 0.1846509 0.19059825 0.2371254 0.21251607
0.20379901 0.20380235 0.17668462 0.19526935]
mean value: 0.19930989742279054
key: test_mcc
value: [0.65909298 0.6992059 0.58930667 0.76277007 0.83743579 0.83971912
0.91605722 0.91605722 0.91287093 0.91287093]
mean value: 0.8045386839254043
key: train_mcc
value: [0.90516294 0.94216887 0.93386476 0.91325992 0.92355447 0.90523324
0.90523324 0.88720829 0.92389898 0.91473626]
mean value: 0.915432096195379
key: test_accuracy
value: [0.82608696 0.82608696 0.7826087 0.86956522 0.91304348 0.91304348
0.95652174 0.95652174 0.95454545 0.95454545]
mean value: 0.8952569169960475
key: train_accuracy
value: [0.95121951 0.97073171 0.96585366 0.95609756 0.96097561 0.95121951
0.95121951 0.94146341 0.96116505 0.95631068]
mean value: 0.9566256215960217
key: test_fscore
value: [0.8 0.84615385 0.8 0.84210526 0.92307692 0.90909091
0.96 0.96 0.95652174 0.95652174]
mean value: 0.8953470419740442
key: train_fscore
value: [0.95327103 0.97142857 0.96713615 0.95734597 0.96190476 0.95283019
0.95283019 0.94392523 0.96226415 0.95774648]
mean value: 0.9580682723989425
key: test_precision
value: [0.88888889 0.73333333 0.71428571 1. 0.85714286 1.
0.92307692 0.92307692 0.91666667 0.91666667]
mean value: 0.8873137973137973
key: train_precision
value: [0.91891892 0.95327103 0.93636364 0.93518519 0.93518519 0.91818182
0.91818182 0.90178571 0.93577982 0.92727273]
mean value: 0.9280125848126148
key: test_recall
value: [0.72727273 1. 0.90909091 0.72727273 1. 0.83333333
1. 1. 1. 1. ]
mean value: 0.9196969696969697
key: train_recall
value: [0.99029126 0.99029126 1. 0.98058252 0.99019608 0.99019608
0.99019608 0.99019608 0.99029126 0.99029126]
mean value: 0.9902531886541024
key: test_roc_auc
value: [0.8219697 0.83333333 0.78787879 0.86363636 0.90909091 0.91666667
0.95454545 0.95454545 0.95454545 0.95454545]
mean value: 0.8950757575757575
key: train_roc_auc
value: [0.95102798 0.97063583 0.96568627 0.95597754 0.96111746 0.95140872
0.95140872 0.94169998 0.96116505 0.95631068]
mean value: 0.9566438225775747
key: test_jcc
value: [0.66666667 0.73333333 0.66666667 0.72727273 0.85714286 0.83333333
0.92307692 0.92307692 0.91666667 0.91666667]
mean value: 0.8163902763902764
key: train_jcc
value: [0.91071429 0.94444444 0.93636364 0.91818182 0.9266055 0.90990991
0.90990991 0.89380531 0.92727273 0.91891892]
mean value: 0.919612646503732
MCC on Blind test: 0.34
Accuracy on Blind test: 0.64
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02035141 0.00765848 0.00785589 0.00786734 0.00790954 0.0078063
0.00794077 0.00780129 0.00784397 0.00801635]
mean value: 0.00910513401031494
key: score_time
value: [0.01214242 0.00866485 0.00880075 0.00867152 0.00868607 0.00868559
0.0086391 0.00873423 0.00867414 0.00874686]
mean value: 0.009044551849365234
key: test_mcc
value: [ 0.39393939 0.06579517 -0.03816905 0.38932432 0.33946383 0.56490196
0.33946383 0.21452908 0.54772256 0.36514837]
mean value: 0.318211945518085
key: train_mcc
value: [0.39749865 0.36390677 0.37171873 0.369368 0.40852696 0.36225341
0.37286188 0.38354703 0.38043802 0.34401398]
mean value: 0.37541334377025193
key: test_accuracy
value: [0.69565217 0.52173913 0.47826087 0.69565217 0.65217391 0.7826087
0.65217391 0.60869565 0.77272727 0.68181818]
mean value: 0.6541501976284585
key: train_accuracy
value: [0.69756098 0.67804878 0.68292683 0.68292683 0.70243902 0.67804878
0.68292683 0.68780488 0.68932039 0.66504854]
mean value: 0.6847051858868103
key: test_fscore
value: [0.69565217 0.59259259 0.5 0.66666667 0.73333333 0.8
0.73333333 0.66666667 0.7826087 0.66666667]
mean value: 0.6837520128824477
key: train_fscore
value: [0.71559633 0.71052632 0.71111111 0.70588235 0.71889401 0.7027027
0.70852018 0.71428571 0.7037037 0.70638298]
mean value: 0.7097605398121303
key: test_precision
value: [0.66666667 0.5 0.46153846 0.7 0.61111111 0.76923077
0.61111111 0.6 0.75 0.7 ]
mean value: 0.6369658119658119
key: train_precision
value: [0.67826087 0.648 0.6557377 0.66101695 0.67826087 0.65
0.65289256 0.6557377 0.67256637 0.62878788]
mean value: 0.6581260910571809
key: test_recall
value: [0.72727273 0.72727273 0.54545455 0.63636364 0.91666667 0.83333333
0.91666667 0.75 0.81818182 0.63636364]
mean value: 0.7507575757575757
key: train_recall
value: [0.75728155 0.78640777 0.77669903 0.75728155 0.76470588 0.76470588
0.7745098 0.78431373 0.73786408 0.80582524]
mean value: 0.7709594517418618
key: test_roc_auc
value: [0.6969697 0.53030303 0.48106061 0.69318182 0.64015152 0.78030303
0.64015152 0.60227273 0.77272727 0.68181818]
mean value: 0.6518939393939394
key: train_roc_auc
value: [0.69726823 0.67751761 0.68246716 0.68256235 0.70274129 0.67846945
0.68337141 0.68827337 0.68932039 0.66504854]
mean value: 0.6847039786788501
key: test_jcc
value: [0.53333333 0.42105263 0.33333333 0.5 0.57894737 0.66666667
0.57894737 0.5 0.64285714 0.5 ]
mean value: 0.5255137844611529
key: train_jcc
value: [0.55714286 0.55102041 0.55172414 0.54545455 0.56115108 0.54166667
0.54861111 0.55555556 0.54285714 0.54605263]
mean value: 0.5501236135597817
MCC on Blind test: 0.47
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09073043 0.0420742 0.04870391 0.04456186 0.04261518 0.04289222
0.04772949 0.04729652 0.04701734 0.04742146]
mean value: 0.05010426044464111
key: score_time
value: [0.00988626 0.01027012 0.01072621 0.00994897 0.00984526 0.0098393
0.01010942 0.01019359 0.01010847 0.01034665]
mean value: 0.010127425193786621
key: test_mcc
value: [0.82575758 0.6992059 0.74242424 0.91605722 0.74242424 0.91666667
0.91605722 1. 0.91287093 0.81818182]
mean value: 0.8489645823067301
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91304348 0.82608696 0.86956522 0.95652174 0.86956522 0.95652174
0.95652174 1. 0.95454545 0.90909091]
mean value: 0.9211462450592885
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.84615385 0.86956522 0.95238095 0.86956522 0.95652174
0.96 1. 0.95652174 0.90909091]
mean value: 0.9228890529760094
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.73333333 0.83333333 1. 0.90909091 1.
0.92307692 1. 0.91666667 0.90909091]
mean value: 0.9133682983682984
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 1. 0.90909091 0.90909091 0.83333333 0.91666667
1. 1. 1. 0.90909091]
mean value: 0.9386363636363636
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91287879 0.83333333 0.87121212 0.95454545 0.87121212 0.95833333
0.95454545 1. 0.95454545 0.90909091]
mean value: 0.921969696969697
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.73333333 0.76923077 0.90909091 0.76923077 0.91666667
0.92307692 1. 0.91666667 0.83333333]
mean value: 0.8603962703962704
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.07
Accuracy on Blind test: 0.52
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01050711 0.02742529 0.02827191 0.03134918 0.03139067 0.03144336
0.03741717 0.03171349 0.02641296 0.03151488]
mean value: 0.028744602203369142
key: score_time
value: [0.01013732 0.0210259 0.01963973 0.01852012 0.0103941 0.02017403
0.01831055 0.01890373 0.02068543 0.01523781]
mean value: 0.017302870750427246
key: test_mcc
value: [0.69084928 0.65151515 0.39393939 0.91605722 0.65151515 0.91666667
0.74047959 0.91605722 0.75592895 0.91287093]
mean value: 0.7545879558265087
key: train_mcc
value: [0.86404384 0.86356283 0.9024367 0.83417421 0.87321531 0.83418999
0.85370265 0.86358877 0.86407767 0.8544092 ]
mean value: 0.8607401153973653
key: test_accuracy
value: [0.82608696 0.82608696 0.69565217 0.95652174 0.82608696 0.95652174
0.86956522 0.95652174 0.86363636 0.95454545]
mean value: 0.8731225296442688
key: train_accuracy
value: [0.93170732 0.93170732 0.95121951 0.91707317 0.93658537 0.91707317
0.92682927 0.93170732 0.93203883 0.92718447]
mean value: 0.9303125739995264
key: test_fscore
value: [0.77777778 0.81818182 0.69565217 0.95238095 0.83333333 0.95652174
0.88 0.96 0.84210526 0.95238095]
mean value: 0.8668334010256207
key: train_fscore
value: [0.93333333 0.93269231 0.95145631 0.9178744 0.93658537 0.91707317
0.92682927 0.93203883 0.93203883 0.92682927]
mean value: 0.9306751090914163
key: test_precision
value: [1. 0.81818182 0.66666667 1. 0.83333333 1.
0.84615385 0.92307692 1. 1. ]
mean value: 0.9087412587412588
key: train_precision
value: [0.91588785 0.92380952 0.95145631 0.91346154 0.93203883 0.91262136
0.9223301 0.92307692 0.93203883 0.93137255]
mean value: 0.9258093821728087
key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.90909091 0.83333333 0.91666667
0.91666667 1. 0.72727273 0.90909091]
mean value: 0.8393939393939394
key: train_recall
value: [0.95145631 0.94174757 0.95145631 0.9223301 0.94117647 0.92156863
0.93137255 0.94117647 0.93203883 0.9223301 ]
mean value: 0.935665334094803
key: test_roc_auc
value: [0.81818182 0.82575758 0.6969697 0.95454545 0.82575758 0.95833333
0.86742424 0.95454545 0.86363636 0.95454545]
mean value: 0.871969696969697
key: train_roc_auc
value: [0.93161051 0.9316581 0.95121835 0.9170474 0.93660765 0.91709499
0.92685132 0.93175328 0.93203883 0.92718447]
mean value: 0.9303064915286503
key: test_jcc
value: [0.63636364 0.69230769 0.53333333 0.90909091 0.71428571 0.91666667
0.78571429 0.92307692 0.72727273 0.90909091]
mean value: 0.7747202797202797
key: train_jcc
value: [0.875 0.87387387 0.90740741 0.84821429 0.88073394 0.84684685
0.86363636 0.87272727 0.87272727 0.86363636]
mean value: 0.8704803631523815
MCC on Blind test: 0.1
Accuracy on Blind test: 0.55
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.00954151 0.00753498 0.0070641 0.00683141 0.00684738 0.00718474
0.00700259 0.00671387 0.00674605 0.00678587]
mean value: 0.007225251197814942
key: score_time
value: [0.00951552 0.00836635 0.0077734 0.00772834 0.00780702 0.00820851
0.00769925 0.00763559 0.00767422 0.00770903]
mean value: 0.008011722564697265
key: test_mcc
value: [ 0.38932432 0.58930667 0.23262105 0.38932432 0.38932432 0.66414149
0.56490196 -0.06579517 0.29277002 0.36514837]
mean value: 0.3811067348212412
key: train_mcc
value: [0.48336719 0.44537263 0.49337247 0.42577585 0.49527272 0.41611143
0.48421652 0.47567594 0.44763689 0.43896694]
mean value: 0.4605768597645512
key: test_accuracy
value: [0.69565217 0.7826087 0.60869565 0.69565217 0.69565217 0.82608696
0.7826087 0.47826087 0.63636364 0.68181818]
mean value: 0.6883399209486166
key: train_accuracy
value: [0.74146341 0.72195122 0.74634146 0.71219512 0.74634146 0.70731707
0.74146341 0.73658537 0.72330097 0.7184466 ]
mean value: 0.7295406109400899
key: test_fscore
value: [0.66666667 0.8 0.64 0.66666667 0.72 0.81818182
0.8 0.57142857 0.69230769 0.66666667]
mean value: 0.7041918081918082
key: train_fscore
value: [0.74881517 0.73488372 0.75471698 0.7255814 0.75700935 0.71698113
0.74881517 0.74766355 0.73239437 0.73148148]
mean value: 0.7398342306115098
key: test_precision
value: [0.7 0.71428571 0.57142857 0.7 0.69230769 0.9
0.76923077 0.5 0.6 0.7 ]
mean value: 0.6847252747252747
key: train_precision
value: [0.73148148 0.70535714 0.73394495 0.69642857 0.72321429 0.69090909
0.72477064 0.71428571 0.70909091 0.69911504]
mean value: 0.7128597836345258
key: test_recall
value: [0.63636364 0.90909091 0.72727273 0.63636364 0.75 0.75
0.83333333 0.66666667 0.81818182 0.63636364]
mean value: 0.7363636363636363
key: train_recall
value: [0.76699029 0.76699029 0.77669903 0.75728155 0.79411765 0.74509804
0.7745098 0.78431373 0.75728155 0.76699029]
mean value: 0.7690272225395012
key: test_roc_auc
value: [0.69318182 0.78787879 0.61363636 0.69318182 0.69318182 0.82954545
0.78030303 0.46969697 0.63636364 0.68181818]
mean value: 0.6878787878787879
key: train_roc_auc
value: [0.74133828 0.72173044 0.74619265 0.71197411 0.74657339 0.70750048
0.74162383 0.73681706 0.72330097 0.7184466 ]
mean value: 0.7295497810774796
key: test_jcc
value: [0.5 0.66666667 0.47058824 0.5 0.5625 0.69230769
0.66666667 0.4 0.52941176 0.5 ]
mean value: 0.5488141025641026
key: train_jcc
value: [0.59848485 0.58088235 0.60606061 0.56934307 0.60902256 0.55882353
0.59848485 0.59701493 0.57777778 0.57664234]
mean value: 0.5872536846384988
MCC on Blind test: 0.43
Accuracy on Blind test: 0.71
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00825429 0.01047587 0.01012659 0.01016808 0.01010203 0.00998354
0.01053905 0.01024556 0.01085734 0.01041651]
mean value: 0.010116887092590333
key: score_time
value: [0.00801587 0.01022434 0.01021957 0.01031947 0.01033974 0.01022935
0.01024055 0.01027536 0.01019835 0.01020384]
mean value: 0.010026645660400391
key: test_mcc
value: [0.65909298 0.42228828 0.37057951 0.76764947 0.76277007 0.74242424
0.55048188 0.40451992 0.91287093 0.75592895]
mean value: 0.6348606236325591
key: train_mcc
value: [0.85690497 0.73153872 0.87320324 0.70302948 0.80930285 0.8300002
0.72436632 0.58762141 0.80469539 0.81866523]
mean value: 0.7739327829063736
key: test_accuracy
value: [0.82608696 0.69565217 0.65217391 0.86956522 0.86956522 0.86956522
0.73913043 0.65217391 0.95454545 0.86363636]
mean value: 0.799209486166008
key: train_accuracy
value: [0.92682927 0.85365854 0.93658537 0.83414634 0.89756098 0.91219512
0.84390244 0.75609756 0.89805825 0.90776699]
mean value: 0.8766800852474544
key: test_fscore
value: [0.8 0.58823529 0.71428571 0.88 0.88888889 0.86956522
0.8 0.75 0.95238095 0.88 ]
mean value: 0.8123356067064507
key: train_fscore
value: [0.93023256 0.83333333 0.93719807 0.85714286 0.9058296 0.90625
0.86440678 0.80314961 0.89005236 0.91162791]
mean value: 0.8839223061619048
key: test_precision
value: [0.88888889 0.83333333 0.58823529 0.78571429 0.8 0.90909091
0.66666667 0.6 1. 0.78571429]
mean value: 0.7857643663526016
key: train_precision
value: [0.89285714 0.97402597 0.93269231 0.75555556 0.83471074 0.96666667
0.76119403 0.67105263 0.96590909 0.875 ]
mean value: 0.8629664142938084
key: test_recall
value: [0.72727273 0.45454545 0.90909091 1. 1. 0.83333333
1. 1. 0.90909091 1. ]
mean value: 0.8833333333333333
key: train_recall
value: [0.97087379 0.72815534 0.94174757 0.99029126 0.99019608 0.85294118
1. 1. 0.82524272 0.95145631]
mean value: 0.9250904245193223
key: test_roc_auc
value: [0.8219697 0.68560606 0.66287879 0.875 0.86363636 0.87121212
0.72727273 0.63636364 0.95454545 0.86363636]
mean value: 0.7962121212121211
key: train_roc_auc
value: [0.92661336 0.85427375 0.93656006 0.83338093 0.89801066 0.91190748
0.84466019 0.75728155 0.89805825 0.90776699]
mean value: 0.8768513230534933
key: test_jcc
value: [0.66666667 0.41666667 0.55555556 0.78571429 0.8 0.76923077
0.66666667 0.6 0.90909091 0.78571429]
mean value: 0.6955305805305805
key: train_jcc
value: [0.86956522 0.71428571 0.88181818 0.75 0.82786885 0.82857143
0.76119403 0.67105263 0.80188679 0.83760684]
mean value: 0.7943849686015007
MCC on Blind test: 0.24
Accuracy on Blind test: 0.62
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00993443 0.01050615 0.00986814 0.00974298 0.01031899 0.01094747
0.00993729 0.01018286 0.01007271 0.01059127]
mean value: 0.010210227966308594
key: score_time
value: [0.01083398 0.01023293 0.01024818 0.01021266 0.01024604 0.01035571
0.01024151 0.01023602 0.01023817 0.01028705]
mean value: 0.01031322479248047
key: test_mcc
value: [0.56490196 0.58930667 0.48075018 1. 0.47923384 0.82575758
0.76277007 0.58930667 0.83205029 0.63636364]
mean value: 0.6760440880009018
key: train_mcc
value: [0.78910244 0.834498 0.74442173 0.81555702 0.74362503 0.82697375
0.85470694 0.87321531 0.82432211 0.83815726]
mean value: 0.814457959189338
key: test_accuracy
value: [0.7826087 0.7826087 0.73913043 1. 0.69565217 0.91304348
0.86956522 0.7826087 0.90909091 0.81818182]
mean value: 0.8292490118577075
key: train_accuracy
value: [0.88780488 0.91219512 0.86341463 0.90731707 0.85853659 0.91219512
0.92682927 0.93658537 0.90776699 0.91747573]
mean value: 0.903012076722709
key: test_fscore
value: [0.76190476 0.8 0.7 1. 0.77419355 0.91666667
0.88888889 0.76190476 0.91666667 0.81818182]
mean value: 0.8338407112600661
key: train_fscore
value: [0.89777778 0.91891892 0.84782609 0.90995261 0.87445887 0.91509434
0.92822967 0.93658537 0.91402715 0.91370558]
mean value: 0.9056576368372846
key: test_precision
value: [0.8 0.71428571 0.77777778 1. 0.63157895 0.91666667
0.8 0.88888889 0.84615385 0.81818182]
mean value: 0.8193533659323133
key: train_precision
value: [0.82786885 0.85714286 0.96296296 0.88888889 0.78294574 0.88181818
0.90654206 0.93203883 0.8559322 0.95744681]
mean value: 0.8853587382632707
key: test_recall
value: [0.72727273 0.90909091 0.63636364 1. 1. 0.91666667
1. 0.66666667 1. 0.81818182]
mean value: 0.8674242424242424
key: train_recall
value: [0.98058252 0.99029126 0.75728155 0.93203883 0.99019608 0.95098039
0.95098039 0.94117647 0.98058252 0.87378641]
mean value: 0.934789644012945
key: test_roc_auc
value: [0.78030303 0.78787879 0.73484848 1. 0.68181818 0.91287879
0.86363636 0.78787879 0.90909091 0.81818182]
mean value: 0.8276515151515151
key: train_roc_auc
value: [0.88735009 0.9118123 0.86393489 0.90719589 0.85917571 0.9123834
0.92694651 0.93660765 0.90776699 0.91747573]
mean value: 0.903064915286503
key: test_jcc
value: [0.61538462 0.66666667 0.53846154 1. 0.63157895 0.84615385
0.8 0.61538462 0.84615385 0.69230769]
mean value: 0.7252091767881241
key: train_jcc
value: [0.81451613 0.85 0.73584906 0.83478261 0.77692308 0.84347826
0.86607143 0.88073394 0.84166667 0.8411215 ]
mean value: 0.8285142667643652
MCC on Blind test: 0.15
Accuracy on Blind test: 0.57
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.09048128 0.06841111 0.06838012 0.06843686 0.06842685 0.06856799
0.06842709 0.06839943 0.06843877 0.06855869]
mean value: 0.07065281867980958
key: score_time
value: [0.01429415 0.01409531 0.0138483 0.01383138 0.01385355 0.01397872
0.01384473 0.01389337 0.01391649 0.01385307]
mean value: 0.013940906524658203
key: test_mcc
value: [0.82575758 0.58930667 0.74242424 0.83743579 0.74242424 0.91666667
0.91605722 0.91666667 0.73029674 0.81818182]
mean value: 0.8035217636234444
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91304348 0.7826087 0.86956522 0.91304348 0.86956522 0.95652174
0.95652174 0.95652174 0.86363636 0.90909091]
mean value: 0.8990118577075099
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.8 0.86956522 0.9 0.86956522 0.95652174
0.96 0.95652174 0.85714286 0.90909091]
mean value: 0.8987498588368154
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.71428571 0.83333333 1. 0.90909091 1.
0.92307692 1. 0.9 0.90909091]
mean value: 0.9097968697968698
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.90909091 0.90909091 0.81818182 0.83333333 0.91666667
1. 0.91666667 0.81818182 0.90909091]
mean value: 0.8939393939393939
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91287879 0.78787879 0.87121212 0.90909091 0.87121212 0.95833333
0.95454545 0.95833333 0.86363636 0.90909091]
mean value: 0.8996212121212122
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.66666667 0.76923077 0.81818182 0.76923077 0.91666667
0.92307692 0.91666667 0.75 0.83333333]
mean value: 0.8196386946386947
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.04
Accuracy on Blind test: 0.51
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03047633 0.02374792 0.02200127 0.03589177 0.03106284 0.02499914
0.04321909 0.03124619 0.04236913 0.03466034]
mean value: 0.0319674015045166
key: score_time
value: [0.02318501 0.01713753 0.02661037 0.02022576 0.02133465 0.01644993
0.08056641 0.02024651 0.02388358 0.01804686]
mean value: 0.02676866054534912
key: test_mcc
value: [0.91605722 0.58930667 0.65151515 1. 0.74242424 0.91666667
0.91605722 0.91666667 0.91287093 0.91287093]
mean value: 0.8474435701866356
key: train_mcc
value: [0.99029126 0.98048734 1. 1. 0.99029034 1.
0.99029034 0.99029034 0.9613463 0.98076744]
mean value: 0.9883763362991548
key: test_accuracy
value: [0.95652174 0.7826087 0.82608696 1. 0.86956522 0.95652174
0.95652174 0.95652174 0.95454545 0.95454545]
mean value: 0.9213438735177866
key: train_accuracy
value: [0.99512195 0.9902439 1. 1. 0.99512195 1.
0.99512195 0.99512195 0.98058252 0.99029126]
mean value: 0.994160549372484
key: test_fscore
value: [0.95238095 0.8 0.81818182 1. 0.86956522 0.95652174
0.96 0.95652174 0.95652174 0.95652174]
mean value: 0.9226214944475815
key: train_fscore
value: [0.99512195 0.99029126 1. 1. 0.99507389 1.
0.99507389 0.99507389 0.98039216 0.99019608]
mean value: 0.9941223123526399
key: test_precision
value: [1. 0.71428571 0.81818182 1. 0.90909091 1.
0.92307692 1. 0.91666667 0.91666667]
mean value: 0.9197968697968698
key: train_precision
value: [1. 0.99029126 1. 1. 1. 1.
1. 1. 0.99009901 1. ]
mean value: 0.9980390272036912
key: test_recall
value: [0.90909091 0.90909091 0.81818182 1. 0.83333333 0.91666667
1. 0.91666667 1. 1. ]
mean value: 0.9303030303030303
key: train_recall
value: [0.99029126 0.99029126 1. 1. 0.99019608 1.
0.99019608 0.99019608 0.97087379 0.98058252]
mean value: 0.9902627070245574
key: test_roc_auc
value: [0.95454545 0.78787879 0.82575758 1. 0.87121212 0.95833333
0.95454545 0.95833333 0.95454545 0.95454545]
mean value: 0.921969696969697
key: train_roc_auc
value: [0.99514563 0.99024367 1. 1. 0.99509804 1.
0.99509804 0.99509804 0.98058252 0.99029126]
mean value: 0.9941557205406435
key: test_jcc
value: [0.90909091 0.66666667 0.69230769 1. 0.76923077 0.91666667
0.92307692 0.91666667 0.91666667 0.91666667]
mean value: 0.8627039627039627
key: train_jcc
value: [0.99029126 0.98076923 1. 1. 0.99019608 1.
0.99019608 0.99019608 0.96153846 0.98058252]
mean value: 0.9883769714009577
MCC on Blind test: 0.14
Accuracy on Blind test: 0.55
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.0428648 0.05020165 0.05070519 0.05125546 0.05086875 0.05098724
0.05125928 0.05102658 0.0502708 0.05003095]
mean value: 0.049947071075439456
key: score_time
value: [0.02246642 0.02030492 0.02142429 0.02147484 0.02218461 0.02291584
0.02219915 0.02224135 0.01964617 0.02045941]
mean value: 0.02153170108795166
key: test_mcc
value: [0.48075018 0.48856385 0.39393939 0.65909298 0.56490196 0.6992059
0.91666667 0.38932432 0.68313005 0.18257419]
mean value: 0.5458149483128502
key: train_mcc
value: [0.94306341 0.92211753 0.9024367 0.90261781 0.92211753 0.9024367
0.93175328 0.91224062 0.90291262 0.88366175]
mean value: 0.9125357952633087
key: test_accuracy
value: [0.73913043 0.73913043 0.69565217 0.82608696 0.7826087 0.82608696
0.95652174 0.69565217 0.81818182 0.59090909]
mean value: 0.76699604743083
key: train_accuracy
value: [0.97073171 0.96097561 0.95121951 0.95121951 0.96097561 0.95121951
0.96585366 0.95609756 0.95145631 0.94174757]
mean value: 0.9561496566421974
key: test_fscore
value: [0.7 0.75 0.69565217 0.8 0.8 0.8
0.95652174 0.72 0.77777778 0.57142857]
mean value: 0.7571380262249827
key: train_fscore
value: [0.97169811 0.96153846 0.95145631 0.95098039 0.96039604 0.95098039
0.96585366 0.95609756 0.95145631 0.94230769]
mean value: 0.9562764931842805
key: test_precision
value: [0.77777778 0.69230769 0.66666667 0.88888889 0.76923077 1.
1. 0.69230769 1. 0.6 ]
mean value: 0.8087179487179487
key: train_precision
value: [0.94495413 0.95238095 0.95145631 0.96039604 0.97 0.95098039
0.96116505 0.95145631 0.95145631 0.93333333]
mean value: 0.9527578826498
key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.72727273 0.83333333 0.66666667
0.91666667 0.75 0.63636364 0.54545455]
mean value: 0.7257575757575757
key: train_recall
value: [1. 0.97087379 0.95145631 0.94174757 0.95098039 0.95098039
0.97058824 0.96078431 0.95145631 0.95145631]
mean value: 0.9600323624595469
key: test_roc_auc
value: [0.73484848 0.74242424 0.6969697 0.8219697 0.78030303 0.83333333
0.95833333 0.69318182 0.81818182 0.59090909]
mean value: 0.7670454545454545
key: train_roc_auc
value: [0.97058824 0.96092709 0.95121835 0.95126594 0.96092709 0.95121835
0.96587664 0.95612031 0.95145631 0.94174757]
mean value: 0.9561345897582334
key: test_jcc
value: [0.53846154 0.6 0.53333333 0.66666667 0.66666667 0.66666667
0.91666667 0.5625 0.63636364 0.4 ]
mean value: 0.6187325174825175
key: train_jcc
value: [0.94495413 0.92592593 0.90740741 0.90654206 0.92380952 0.90654206
0.93396226 0.91588785 0.90740741 0.89090909]
mean value: 0.9163347710667489
MCC on Blind test: 0.34
Accuracy on Blind test: 0.67
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.14630938 0.13025904 0.12856793 0.12697887 0.1226666 0.12366438
0.12613249 0.12406492 0.1238749 0.1242547 ]
mean value: 0.127677321434021
key: score_time
value: [0.0091486 0.00918722 0.00929856 0.0083158 0.00833082 0.00817156
0.00841713 0.00818658 0.00882602 0.0082829 ]
mean value: 0.0086165189743042
key: test_mcc
value: [0.82575758 0.58930667 0.74242424 1. 0.74242424 1.
1. 1. 0.91287093 0.91287093]
mean value: 0.8725654585542313
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91304348 0.7826087 0.86956522 1. 0.86956522 1.
1. 1. 0.95454545 0.95454545]
mean value: 0.9343873517786562
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.8 0.86956522 1. 0.86956522 1.
1. 1. 0.95652174 0.95652174]
mean value: 0.9361264822134387
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.71428571 0.83333333 1. 0.90909091 1.
1. 1. 0.91666667 0.91666667]
mean value: 0.9199134199134199
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.90909091 0.90909091 1. 0.83333333 1.
1. 1. 1. 1. ]
mean value: 0.956060606060606
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91287879 0.78787879 0.87121212 1. 0.87121212 1.
1. 1. 0.95454545 0.95454545]
mean value: 0.9352272727272727
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.66666667 0.76923077 1. 0.76923077 1.
1. 1. 0.91666667 0.91666667]
mean value: 0.8871794871794871
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.14
Accuracy on Blind test: 0.55
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00901484 0.01138854 0.01411247 0.01156902 0.01335311 0.01166534
0.01175451 0.01167417 0.01154208 0.01187825]
mean value: 0.011795234680175782
key: score_time
value: [0.01050758 0.01051354 0.0105617 0.01048422 0.0105226 0.01054072
0.01047564 0.01055694 0.01061535 0.01278687]
mean value: 0.010756516456604004
key: test_mcc
value: [0.17236256 0.6992059 0.29359034 0.76764947 0.22268089 0.74242424
0. 0.22268089 0.56694671 0.61237244]
mean value: 0.4299913432106543
key: train_mcc
value: [0.56341118 0.60589978 0.60122852 0.61135735 0.48234717 0.56859428
0.4515346 0.56519801 0.56644742 0.60352167]
mean value: 0.5619539992238989
key: test_accuracy
value: [0.56521739 0.82608696 0.60869565 0.86956522 0.56521739 0.86956522
0.52173913 0.56521739 0.77272727 0.77272727]
mean value: 0.6936758893280632
key: train_accuracy
value: [0.74634146 0.7804878 0.7902439 0.7804878 0.68780488 0.76585366
0.66829268 0.74146341 0.76213592 0.76699029]
mean value: 0.7490101823348331
key: test_fscore
value: [0.64285714 0.84615385 0.68965517 0.88 0.70588235 0.86956522
0.68571429 0.70588235 0.8 0.81481481]
mean value: 0.7640525185227539
key: train_fscore
value: [0.796875 0.81632653 0.81545064 0.81781377 0.76119403 0.8
0.75 0.79377432 0.8 0.81102362]
mean value: 0.7962457910535393
key: test_precision
value: [0.52941176 0.73333333 0.55555556 0.78571429 0.54545455 0.90909091
0.52173913 0.54545455 0.71428571 0.6875 ]
mean value: 0.6527539784029553
key: train_precision
value: [0.66666667 0.70422535 0.73076923 0.70138889 0.61445783 0.69565217
0.6 0.65806452 0.69014085 0.68211921]
mean value: 0.6743484710173275
key: test_recall
value: [0.81818182 1. 0.90909091 1. 1. 0.83333333
1. 1. 0.90909091 1. ]
mean value: 0.946969696969697
key: train_recall
value: [0.99029126 0.97087379 0.9223301 0.98058252 1. 0.94117647
1. 1. 0.95145631 1. ]
mean value: 0.9756710451170759
key: test_roc_auc
value: [0.57575758 0.83333333 0.62121212 0.875 0.54545455 0.87121212
0.5 0.54545455 0.77272727 0.77272727]
mean value: 0.6912878787878788
key: train_roc_auc
value: [0.74514563 0.77955454 0.78959642 0.77950695 0.68932039 0.76670474
0.66990291 0.74271845 0.76213592 0.76699029]
mean value: 0.7491576242147344
key: test_jcc
value: [0.47368421 0.73333333 0.52631579 0.78571429 0.54545455 0.76923077
0.52173913 0.54545455 0.66666667 0.6875 ]
mean value: 0.6255093276288928
key: train_jcc
value: [0.66233766 0.68965517 0.6884058 0.69178082 0.61445783 0.66666667
0.6 0.65806452 0.66666667 0.68211921]
mean value: 0.6620154339856393
MCC on Blind test: 0.32
Accuracy on Blind test: 0.6
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01143074 0.0101707 0.01010108 0.0101788 0.01018381 0.01013422
0.01016164 0.01012945 0.01014853 0.010252 ]
mean value: 0.01028909683227539
key: score_time
value: [0.01026726 0.0102675 0.01020479 0.01026201 0.01026845 0.01027465
0.01028585 0.01047206 0.01028395 0.0104382 ]
mean value: 0.010302472114562988
key: test_mcc
value: [0.62050523 0.74242424 0.39393939 0.83743579 0.74047959 0.91666667
0.91605722 0.82575758 1. 0.83205029]
mean value: 0.7825316005376914
key: train_mcc
value: [0.84404459 0.82455974 0.86409538 0.82438607 0.86356283 0.81495251
0.83417421 0.84389872 0.83499081 0.83499081]
mean value: 0.8383655666865866
key: test_accuracy
value: [0.7826087 0.86956522 0.69565217 0.91304348 0.86956522 0.95652174
0.95652174 0.91304348 1. 0.90909091]
mean value: 0.8865612648221344
key: train_accuracy
value: [0.92195122 0.91219512 0.93170732 0.91219512 0.93170732 0.90731707
0.91707317 0.92195122 0.91747573 0.91747573]
mean value: 0.919104901728629
key: test_fscore
value: [0.70588235 0.86956522 0.69565217 0.9 0.88 0.95652174
0.96 0.91666667 1. 0.9 ]
mean value: 0.8784288150042625
key: train_fscore
value: [0.92307692 0.91176471 0.93069307 0.91262136 0.93069307 0.90547264
0.91625616 0.92156863 0.9178744 0.9178744 ]
mean value: 0.9187895340969339
key: test_precision
value: [1. 0.83333333 0.66666667 1. 0.84615385 1.
0.92307692 0.91666667 1. 1. ]
mean value: 0.9185897435897435
key: train_precision
value: [0.91428571 0.92079208 0.94949495 0.91262136 0.94 0.91919192
0.92079208 0.92156863 0.91346154 0.91346154]
mean value: 0.9225669804985783
key: test_recall
value: [0.54545455 0.90909091 0.72727273 0.81818182 0.91666667 0.91666667
1. 0.91666667 1. 0.81818182]
mean value: 0.8568181818181818
key: train_recall
value: [0.93203883 0.90291262 0.91262136 0.91262136 0.92156863 0.89215686
0.91176471 0.92156863 0.9223301 0.9223301 ]
mean value: 0.915191319246145
key: test_roc_auc
value: [0.77272727 0.87121212 0.6969697 0.90909091 0.86742424 0.95833333
0.95454545 0.91287879 1. 0.90909091]
mean value: 0.8852272727272728
key: train_roc_auc
value: [0.92190177 0.91224062 0.93180088 0.91219303 0.9316581 0.90724348
0.9170474 0.92194936 0.91747573 0.91747573]
mean value: 0.9190986103179135
key: test_jcc
value: [0.54545455 0.76923077 0.53333333 0.81818182 0.78571429 0.91666667
0.92307692 0.84615385 1. 0.81818182]
mean value: 0.7955994005994006
key: train_jcc
value: [0.85714286 0.83783784 0.87037037 0.83928571 0.87037037 0.82727273
0.84545455 0.85454545 0.84821429 0.84821429]
mean value: 0.8498708448708449
MCC on Blind test: 0.19
Accuracy on Blind test: 0.59
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.1247797 0.08746672 0.08165932 0.08129358 0.08163166 0.08223915
0.0814786 0.08128548 0.08197618 0.09512329]
mean value: 0.08789336681365967
key: score_time
value: [0.01067257 0.01061177 0.01049924 0.01049209 0.01054454 0.01051307
0.01054621 0.0105319 0.01048279 0.01061583]
mean value: 0.010550999641418457
key: test_mcc
value: [0.69084928 0.65151515 0.39393939 0.91605722 0.74047959 0.91666667
0.74047959 0.91605722 0.83205029 0.83205029]
mean value: 0.7630144710301748
key: train_mcc
value: [0.85400014 0.87352395 0.89272796 0.81467733 0.86356283 0.84389872
0.84389872 0.86358877 0.83499081 0.83499081]
mean value: 0.8519860051204752
key: test_accuracy
value: [0.82608696 0.82608696 0.69565217 0.95652174 0.86956522 0.95652174
0.86956522 0.95652174 0.90909091 0.90909091]
mean value: 0.8774703557312253
key: train_accuracy
value: [0.92682927 0.93658537 0.94634146 0.90731707 0.93170732 0.92195122
0.92195122 0.93170732 0.91747573 0.91747573]
mean value: 0.925934170021312
key: test_fscore
value: [0.77777778 0.81818182 0.69565217 0.95238095 0.88 0.95652174
0.88 0.96 0.9 0.9 ]
mean value: 0.8720514461384027
key: train_fscore
value: [0.92822967 0.93779904 0.94634146 0.90731707 0.93069307 0.92156863
0.92156863 0.93203883 0.9178744 0.9178744 ]
mean value: 0.9261305196150217
key: test_precision
value: [1. 0.81818182 0.66666667 1. 0.84615385 1.
0.84615385 0.92307692 1. 1. ]
mean value: 0.91002331002331
key: train_precision
value: [0.91509434 0.9245283 0.95098039 0.91176471 0.94 0.92156863
0.92156863 0.92307692 0.91346154 0.91346154]
mean value: 0.9235504994450611
key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.90909091 0.91666667 0.91666667
0.91666667 1. 0.81818182 0.81818182]
mean value: 0.8477272727272728
key: train_recall
value: [0.94174757 0.95145631 0.94174757 0.90291262 0.92156863 0.92156863
0.92156863 0.94117647 0.9223301 0.9223301 ]
mean value: 0.9288406624785837
key: test_roc_auc
value: [0.81818182 0.82575758 0.6969697 0.95454545 0.86742424 0.95833333
0.86742424 0.95454545 0.90909091 0.90909091]
mean value: 0.8761363636363636
key: train_roc_auc
value: [0.92675614 0.93651247 0.94636398 0.90733866 0.9316581 0.92194936
0.92194936 0.93175328 0.91747573 0.91747573]
mean value: 0.9259232819341329
key: test_jcc
value: [0.63636364 0.69230769 0.53333333 0.90909091 0.78571429 0.91666667
0.78571429 0.92307692 0.81818182 0.81818182]
mean value: 0.7818631368631369
key: train_jcc
value: [0.86607143 0.88288288 0.89814815 0.83035714 0.87037037 0.85454545
0.85454545 0.87272727 0.84821429 0.84821429]
mean value: 0.8626076726076726
MCC on Blind test: 0.12
Accuracy on Blind test: 0.56
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.01771116 0.01344275 0.0129807 0.01292109 0.01246119 0.01508546
0.01303816 0.0129621 0.01367259 0.01364589]
mean value: 0.013792109489440919
key: score_time
value: [0.0105226 0.00815105 0.00787663 0.00783443 0.00784397 0.00815797
0.00775981 0.00775981 0.0080657 0.0077579 ]
mean value: 0.008172988891601562
key: test_mcc
value: [ 0.56407607 0.875 0.63245553 0.57735027 0.57735027 0.57735027
1. -0.14285714 0.31622777 0.28867513]
mean value: 0.5265628172174828
key: train_mcc
value: [0.8114612 0.76470609 0.70321085 0.75 0.73446466 0.78278036
0.71910121 0.75146915 0.73446466 0.78163175]
mean value: 0.7533289937125471
key: test_accuracy
value: [0.73333333 0.93333333 0.78571429 0.78571429 0.78571429 0.78571429
1. 0.42857143 0.64285714 0.64285714]
mean value: 0.7523809523809524
key: train_accuracy
value: [0.90551181 0.88188976 0.8515625 0.875 0.8671875 0.890625
0.859375 0.875 0.8671875 0.890625 ]
mean value: 0.876396407480315
key: test_fscore
value: [0.77777778 0.93333333 0.82352941 0.76923077 0.76923077 0.76923077
1. 0.42857143 0.70588235 0.66666667]
mean value: 0.7643453278747396
key: train_fscore
value: [0.9047619 0.88372093 0.85271318 0.875 0.86821705 0.89393939
0.86153846 0.87878788 0.86821705 0.89230769]
mean value: 0.8779203548389595
key: test_precision
value: [0.63636364 1. 0.7 0.83333333 0.83333333 0.83333333
1. 0.42857143 0.6 0.625 ]
mean value: 0.7489935064935065
key: train_precision
value: [0.91935484 0.86363636 0.84615385 0.875 0.86153846 0.86764706
0.84848485 0.85294118 0.86153846 0.87878788]
mean value: 0.8675082934143655
key: test_recall
value: [1. 0.875 1. 0.71428571 0.71428571 0.71428571
1. 0.42857143 0.85714286 0.71428571]
mean value: 0.8017857142857143
key: train_recall
value: [0.890625 0.9047619 0.859375 0.875 0.875 0.921875 0.875
0.90625 0.875 0.90625 ]
mean value: 0.8889136904761905
key: test_roc_auc
value: [0.75 0.9375 0.78571429 0.78571429 0.78571429 0.78571429
1. 0.42857143 0.64285714 0.64285714]
mean value: 0.7544642857142857
key: train_roc_auc
value: [0.90562996 0.88206845 0.8515625 0.875 0.8671875 0.890625
0.859375 0.875 0.8671875 0.890625 ]
mean value: 0.8764260912698413
key: test_jcc
value: [0.63636364 0.875 0.7 0.625 0.625 0.625
1. 0.27272727 0.54545455 0.5 ]
mean value: 0.6404545454545454
key: train_jcc
value: [0.82608696 0.79166667 0.74324324 0.77777778 0.76712329 0.80821918
0.75675676 0.78378378 0.76712329 0.80555556]
mean value: 0.782733649373018
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.38585448 0.3669436 0.36463881 0.37112308 0.37720537 0.37863564
0.39239931 0.38488078 0.38222742 0.37554836]
mean value: 0.37794568538665774
key: score_time
value: [0.00842977 0.00810385 0.00817847 0.00810599 0.00813341 0.00875115
0.00849652 0.00883865 0.00872874 0.0085032 ]
mean value: 0.008426976203918458
key: test_mcc
value: [0.56407607 0.46428571 0.71428571 0.4472136 0.8660254 0.57735027
1. 0.4472136 0.42857143 0.42857143]
mean value: 0.5937593224506033
key: train_mcc
value: [1. 0.95287698 1. 0.9379581 1. 0.95417386
0.98449518 0.96922337 1. 0.95324137]
mean value: 0.9751968870278498
key: test_accuracy
value: [0.73333333 0.73333333 0.85714286 0.71428571 0.92857143 0.78571429
1. 0.71428571 0.71428571 0.71428571]
mean value: 0.7895238095238095
key: train_accuracy
value: [1. 0.97637795 1. 0.96875 1. 0.9765625
0.9921875 0.984375 1. 0.9765625 ]
mean value: 0.9874815452755905
key: test_fscore
value: [0.77777778 0.75 0.85714286 0.66666667 0.92307692 0.76923077
1. 0.66666667 0.71428571 0.71428571]
mean value: 0.7839133089133089
key: train_fscore
value: [1. 0.97637795 1. 0.96923077 1. 0.97709924
0.99224806 0.98461538 1. 0.97674419]
mean value: 0.9876315591305296
key: test_precision
value: [0.63636364 0.75 0.85714286 0.8 1. 0.83333333
1. 0.8 0.71428571 0.71428571]
mean value: 0.8105411255411256
key: train_precision
value: [1. 0.96875 1. 0.95454545 1. 0.95522388
0.98461538 0.96969697 1. 0.96923077]
mean value: 0.9802062458685593
key: test_recall
value: [1. 0.75 0.85714286 0.57142857 0.85714286 0.71428571
1. 0.57142857 0.71428571 0.71428571]
mean value: 0.775
key: train_recall
value: [1. 0.98412698 1. 0.984375 1. 1.
1. 1. 1. 0.984375 ]
mean value: 0.9952876984126984
key: test_roc_auc
value: [0.75 0.73214286 0.85714286 0.71428571 0.92857143 0.78571429
1. 0.71428571 0.71428571 0.71428571]
mean value: 0.7910714285714286
key: train_roc_auc
value: [1. 0.97643849 1. 0.96875 1. 0.9765625
0.9921875 0.984375 1. 0.9765625 ]
mean value: 0.9874875992063492
key: test_jcc
value: [0.63636364 0.6 0.75 0.5 0.85714286 0.625
1. 0.5 0.55555556 0.55555556]
mean value: 0.6579617604617605
key: train_jcc
value: [1. 0.95384615 1. 0.94029851 1. 0.95522388
0.98461538 0.96969697 1. 0.95454545]
mean value: 0.9758226350763665
MCC on Blind test: 0.09
Accuracy on Blind test: 0.54
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00919986 0.0087204 0.0068109 0.00678849 0.00648975 0.00657201
0.00651431 0.00667238 0.00647902 0.00650811]
mean value: 0.0070755243301391605
key: score_time
value: [0.0102222 0.01021814 0.00809312 0.00810909 0.00769401 0.00771046
0.0077498 0.00773859 0.00767064 0.0077889 ]
mean value: 0.008299493789672851
key: test_mcc
value: [ 0.56407607 0.34247476 0.2773501 0.63245553 0.52223297 0.
0.52223297 -0.17407766 0.17407766 0.31622777]
mean value: 0.3177050166425868
key: train_mcc
value: [0.42609813 0.36309219 0.42452948 0.45355737 0.43819207 0.40213949
0.40574111 0.51298918 0.40574111 0.46530981]
mean value: 0.4297389942041204
key: test_accuracy
value: [0.73333333 0.66666667 0.57142857 0.78571429 0.71428571 0.5
0.71428571 0.42857143 0.57142857 0.64285714]
mean value: 0.6328571428571429
key: train_accuracy
value: [0.68503937 0.61417323 0.6875 0.6875 0.6953125 0.6640625
0.671875 0.734375 0.671875 0.7109375 ]
mean value: 0.6822650098425197
key: test_fscore
value: [0.77777778 0.73684211 0.7 0.82352941 0.77777778 0.58823529
0.77777778 0.55555556 0.66666667 0.70588235]
mean value: 0.7110044719642242
key: train_fscore
value: [0.75 0.72 0.74683544 0.75609756 0.75159236 0.73939394
0.74074074 0.77922078 0.74074074 0.76129032]
mean value: 0.7485911883378328
key: test_precision
value: [0.63636364 0.63636364 0.53846154 0.7 0.63636364 0.5
0.63636364 0.45454545 0.54545455 0.6 ]
mean value: 0.5883916083916084
key: train_precision
value: [0.625 0.5625 0.62765957 0.62 0.6344086 0.6039604
0.6122449 0.66666667 0.6122449 0.64835165]
mean value: 0.6213036683594909
key: test_recall
value: [1. 0.875 1. 1. 1. 0.71428571
1. 0.71428571 0.85714286 0.85714286]
mean value: 0.9017857142857143
key: train_recall
value: [0.9375 1. 0.921875 0.96875 0.921875 0.953125 0.9375 0.9375
0.9375 0.921875]
mean value: 0.94375
key: test_roc_auc
value: [0.75 0.65178571 0.57142857 0.78571429 0.71428571 0.5
0.71428571 0.42857143 0.57142857 0.64285714]
mean value: 0.6330357142857143
key: train_roc_auc
value: [0.68303571 0.6171875 0.6875 0.6875 0.6953125 0.6640625
0.671875 0.734375 0.671875 0.7109375 ]
mean value: 0.6823660714285714
key: test_jcc
value: [0.63636364 0.58333333 0.53846154 0.7 0.63636364 0.41666667
0.63636364 0.38461538 0.5 0.54545455]
mean value: 0.5577622377622378
key: train_jcc
value: [0.6 0.5625 0.5959596 0.60784314 0.60204082 0.58653846
0.58823529 0.63829787 0.58823529 0.61458333]
mean value: 0.5984233804988544
MCC on Blind test: 0.42
Accuracy on Blind test: 0.69
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00683665 0.00669193 0.00670457 0.00669694 0.00665402 0.00670123
0.00669932 0.00667048 0.00668812 0.0066514 ]
mean value: 0.006699466705322265
key: score_time
value: [0.00776768 0.0077312 0.0078032 0.00768924 0.0077219 0.00774479
0.00771761 0.00768328 0.00771379 0.00770187]
mean value: 0.0077274560928344725
key: test_mcc
value: [-0.19642857 0.47245559 0.63245553 -0.14285714 0. 0.4472136
0.4472136 0. 0.28867513 0.1490712 ]
mean value: 0.20977989331042105
key: train_mcc
value: [0.35590281 0.40535457 0.36154406 0.34995662 0.43771378 0.36480373
0.40704579 0.34391797 0.37665889 0.375 ]
mean value: 0.37778982176154485
key: test_accuracy
value: [0.4 0.73333333 0.78571429 0.42857143 0.5 0.71428571
0.71428571 0.5 0.64285714 0.57142857]
mean value: 0.599047619047619
key: train_accuracy
value: [0.67716535 0.7007874 0.6796875 0.671875 0.71875 0.6796875
0.703125 0.671875 0.6875 0.6875 ]
mean value: 0.6877952755905512
key: test_fscore
value: [0.4 0.77777778 0.82352941 0.42857143 0.53333333 0.66666667
0.75 0.53333333 0.66666667 0.5 ]
mean value: 0.6079878618113912
key: train_fscore
value: [0.6962963 0.71641791 0.6962963 0.7 0.71428571 0.70503597
0.71212121 0.67692308 0.70149254 0.6875 ]
mean value: 0.7006369014906811
key: test_precision
value: [0.375 0.7 0.7 0.42857143 0.5 0.8
0.66666667 0.5 0.625 0.6 ]
mean value: 0.5895238095238096
key: train_precision
value: [0.66197183 0.67605634 0.66197183 0.64473684 0.72580645 0.65333333
0.69117647 0.66666667 0.67142857 0.6875 ]
mean value: 0.6740648335734973
key: test_recall
value: [0.42857143 0.875 1. 0.42857143 0.57142857 0.57142857
0.85714286 0.57142857 0.71428571 0.42857143]
mean value: 0.6446428571428571
key: train_recall
value: [0.734375 0.76190476 0.734375 0.765625 0.703125 0.765625
0.734375 0.6875 0.734375 0.6875 ]
mean value: 0.7308779761904762
key: test_roc_auc
value: [0.40178571 0.72321429 0.78571429 0.42857143 0.5 0.71428571
0.71428571 0.5 0.64285714 0.57142857]
mean value: 0.5982142857142857
key: train_roc_auc
value: [0.67671131 0.70126488 0.6796875 0.671875 0.71875 0.6796875
0.703125 0.671875 0.6875 0.6875 ]
mean value: 0.687797619047619
key: test_jcc
value: [0.25 0.63636364 0.7 0.27272727 0.36363636 0.5
0.6 0.36363636 0.5 0.33333333]
mean value: 0.45196969696969697
key: train_jcc
value: [0.53409091 0.55813953 0.53409091 0.53846154 0.55555556 0.54444444
0.55294118 0.51162791 0.54022989 0.52380952]
mean value: 0.5393391383841405
MCC on Blind test: 0.39
Accuracy on Blind test: 0.69
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00661182 0.00731397 0.0074687 0.00730467 0.00664449 0.00739288
0.00724483 0.00748181 0.0071733 0.00716472]
mean value: 0.007180118560791015
key: score_time
value: [0.00903392 0.00946617 0.01499748 0.01391649 0.00954628 0.01419783
0.01409173 0.00956845 0.00938296 0.00936103]
mean value: 0.011356234550476074
key: test_mcc
value: [ 0.34247476 0.04029115 0.14285714 -0.1490712 -0.31622777 0.28867513
0.14285714 0.14285714 0.14285714 0.1490712 ]
mean value: 0.0926641847918602
key: train_mcc
value: [0.52955101 0.59052579 0.59491308 0.5172058 0.48729852 0.51568795
0.37518324 0.50221186 0.5787612 0.53229065]
mean value: 0.5223629097954794
key: test_accuracy
value: [0.66666667 0.53333333 0.57142857 0.42857143 0.35714286 0.64285714
0.57142857 0.57142857 0.57142857 0.57142857]
mean value: 0.5485714285714286
key: train_accuracy
value: [0.76377953 0.79527559 0.796875 0.7578125 0.7421875 0.7578125
0.6875 0.75 0.7890625 0.765625 ]
mean value: 0.7605930118110236
key: test_fscore
value: [0.54545455 0.63157895 0.57142857 0.5 0.18181818 0.66666667
0.57142857 0.57142857 0.57142857 0.5 ]
mean value: 0.53112326270221
key: train_fscore
value: [0.7761194 0.79365079 0.79032258 0.76691729 0.72727273 0.75590551
0.68253968 0.73770492 0.784 0.75806452]
mean value: 0.7572497426299365
key: test_precision
value: [0.75 0.54545455 0.57142857 0.44444444 0.25 0.625
0.57142857 0.57142857 0.57142857 0.6 ]
mean value: 0.5500613275613275
key: train_precision
value: [0.74285714 0.79365079 0.81666667 0.73913043 0.77192982 0.76190476
0.69354839 0.77586207 0.80327869 0.78333333]
mean value: 0.7682162102343593
key: test_recall
value: [0.42857143 0.75 0.57142857 0.57142857 0.14285714 0.71428571
0.57142857 0.57142857 0.57142857 0.42857143]
mean value: 0.5321428571428571
key: train_recall
value: [0.8125 0.79365079 0.765625 0.796875 0.6875 0.75
0.671875 0.703125 0.765625 0.734375 ]
mean value: 0.7481150793650794
key: test_roc_auc
value: [0.65178571 0.51785714 0.57142857 0.42857143 0.35714286 0.64285714
0.57142857 0.57142857 0.57142857 0.57142857]
mean value: 0.5455357142857142
key: train_roc_auc
value: [0.76339286 0.7952629 0.796875 0.7578125 0.7421875 0.7578125
0.6875 0.75 0.7890625 0.765625 ]
mean value: 0.7605530753968254
key: test_jcc
value: [0.375 0.46153846 0.4 0.33333333 0.1 0.5
0.4 0.4 0.4 0.33333333]
mean value: 0.3703205128205128
key: train_jcc
value: [0.63414634 0.65789474 0.65333333 0.62195122 0.57142857 0.60759494
0.51807229 0.58441558 0.64473684 0.61038961]
mean value: 0.6103963465355565
MCC on Blind test: 0.2
Accuracy on Blind test: 0.6
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00949454 0.00895095 0.0088582 0.00882721 0.00884652 0.00871706
0.00863242 0.00808167 0.00882864 0.00886631]
mean value: 0.00881035327911377
key: score_time
value: [0.00895262 0.00876713 0.00880933 0.00889349 0.00878167 0.00939393
0.00883389 0.00905347 0.00875926 0.00885344]
mean value: 0.008909821510314941
key: test_mcc
value: [ 0.49099025 0.6000992 0.31622777 0.42857143 0.28867513 0.57735027
0.57735027 -0.17407766 0. 0.1490712 ]
mean value: 0.3254257861286581
key: train_mcc
value: [0.76388889 0.75156113 0.70389875 0.68884672 0.67195703 0.73518314
0.62776482 0.67261436 0.67195703 0.76571848]
mean value: 0.7053390350757779
key: test_accuracy
value: [0.73333333 0.8 0.64285714 0.71428571 0.64285714 0.78571429
0.78571429 0.42857143 0.5 0.57142857]
mean value: 0.6604761904761904
key: train_accuracy
value: [0.88188976 0.87401575 0.8515625 0.84375 0.8359375 0.8671875
0.8125 0.8359375 0.8359375 0.8828125 ]
mean value: 0.8521530511811024
key: test_fscore
value: [0.75 0.82352941 0.70588235 0.71428571 0.61538462 0.76923077
0.76923077 0.55555556 0.53333333 0.5 ]
mean value: 0.673643252172664
key: train_fscore
value: [0.88188976 0.87878788 0.85496183 0.84848485 0.83464567 0.87022901
0.82089552 0.83969466 0.83464567 0.88372093]
mean value: 0.8547955778438756
key: test_precision
value: [0.66666667 0.77777778 0.6 0.71428571 0.66666667 0.83333333
0.83333333 0.45454545 0.5 0.6 ]
mean value: 0.6646608946608946
key: train_precision
value: [0.88888889 0.84057971 0.8358209 0.82352941 0.84126984 0.85074627
0.78571429 0.82089552 0.84126984 0.87692308]
mean value: 0.8405637742542732
key: test_recall
value: [0.85714286 0.875 0.85714286 0.71428571 0.57142857 0.71428571
0.71428571 0.71428571 0.57142857 0.42857143]
mean value: 0.7017857142857142
key: train_recall
value: [0.875 0.92063492 0.875 0.875 0.828125 0.890625
0.859375 0.859375 0.828125 0.890625 ]
mean value: 0.870188492063492
key: test_roc_auc
value: [0.74107143 0.79464286 0.64285714 0.71428571 0.64285714 0.78571429
0.78571429 0.42857143 0.5 0.57142857]
mean value: 0.6607142857142857
key: train_roc_auc
value: [0.88194444 0.87437996 0.8515625 0.84375 0.8359375 0.8671875
0.8125 0.8359375 0.8359375 0.8828125 ]
mean value: 0.8521949404761905
key: test_jcc
value: [0.6 0.7 0.54545455 0.55555556 0.44444444 0.625
0.625 0.38461538 0.36363636 0.33333333]
mean value: 0.5177039627039627
key: train_jcc
value: [0.78873239 0.78378378 0.74666667 0.73684211 0.71621622 0.77027027
0.69620253 0.72368421 0.71621622 0.79166667]
mean value: 0.747028106162106
MCC on Blind test: 0.33
Accuracy on Blind test: 0.67
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.4545188 0.45730591 0.4523077 0.58290815 0.4624598 0.48649025
0.46417975 0.6089313 0.45516968 0.47067261]
mean value: 0.4894943952560425
key: score_time
value: [0.01082301 0.01304746 0.01290703 0.01316142 0.01308894 0.01309848
0.01328969 0.01329231 0.01081371 0.01333547]
mean value: 0.012685751914978028
key: test_mcc
value: [0.49099025 0.05455447 0.42857143 0.42857143 0.57735027 0.57735027
1. 0.14285714 0. 0.1490712 ]
mean value: 0.38493164624692183
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73333333 0.53333333 0.71428571 0.71428571 0.78571429 0.78571429
1. 0.57142857 0.5 0.57142857]
mean value: 0.690952380952381
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.58823529 0.71428571 0.71428571 0.76923077 0.76923077
1. 0.57142857 0.53333333 0.5 ]
mean value: 0.6910030165912519
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 0.55555556 0.71428571 0.71428571 0.83333333 0.83333333
1. 0.57142857 0.5 0.6 ]
mean value: 0.6988888888888889
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.85714286 0.625 0.71428571 0.71428571 0.71428571 0.71428571
1. 0.57142857 0.57142857 0.42857143]
mean value: 0.6910714285714286
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.74107143 0.52678571 0.71428571 0.71428571 0.78571429 0.78571429
1. 0.57142857 0.5 0.57142857]
mean value: 0.6910714285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.41666667 0.55555556 0.55555556 0.625 0.625
1. 0.4 0.36363636 0.33333333]
mean value: 0.5474747474747474
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.15
Accuracy on Blind test: 0.57
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01064634 0.00947118 0.00758362 0.00736046 0.00720119 0.00724173
0.00704122 0.00714111 0.00716519 0.00737357]
mean value: 0.007822561264038085
key: score_time
value: [0.01261735 0.00878739 0.00803566 0.0079627 0.00771618 0.00781584
0.00776696 0.00778174 0.00768185 0.00766587]
mean value: 0.00838315486907959
key: test_mcc
value: [0.66143783 0.875 1. 0.52223297 0.71428571 0.8660254
0.74535599 0.1490712 0.8660254 0.28867513]
mean value: 0.6688109643082562
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8 0.93333333 1. 0.71428571 0.85714286 0.92857143
0.85714286 0.57142857 0.92857143 0.64285714]
mean value: 0.8233333333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 0.93333333 1. 0.6 0.85714286 0.93333333
0.83333333 0.625 0.93333333 0.61538462]
mean value: 0.8154390217625511
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7 1. 1. 1. 0.85714286 0.875
1. 0.55555556 0.875 0.66666667]
mean value: 0.8529365079365079
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.875 1. 0.42857143 0.85714286 1.
0.71428571 0.71428571 1. 0.57142857]
mean value: 0.8160714285714286
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 0.9375 1. 0.71428571 0.85714286 0.92857143
0.85714286 0.57142857 0.92857143 0.64285714]
mean value: 0.8250000000000001
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 0.875 1. 0.42857143 0.75 0.875
0.71428571 0.45454545 0.875 0.44444444]
mean value: 0.7116847041847042
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.55
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.07988238 0.07995939 0.07999921 0.08072162 0.07959414 0.08084035
0.08072901 0.08017445 0.08056831 0.07979488]
mean value: 0.08022637367248535
key: score_time
value: [0.01623416 0.01625252 0.01616454 0.01613855 0.01608205 0.01615548
0.01745152 0.01737761 0.01639533 0.01741838]
mean value: 0.016567015647888185
key: test_mcc
value: [0.37796447 0.875 0.8660254 0.71428571 0.8660254 0.8660254
1. 0.14285714 0.57735027 0.28867513]
mean value: 0.6574208945289839
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.93333333 0.92857143 0.85714286 0.92857143 0.92857143
1. 0.57142857 0.78571429 0.64285714]
mean value: 0.8242857142857143
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.70588235 0.93333333 0.93333333 0.85714286 0.92307692 0.92307692
1. 0.57142857 0.8 0.61538462]
mean value: 0.8262658909717733
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 1. 0.875 0.85714286 1. 1.
1. 0.57142857 0.75 0.66666667]
mean value: 0.8320238095238095
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.85714286 0.875 1. 0.85714286 0.85714286 0.85714286
1. 0.57142857 0.85714286 0.57142857]
mean value: 0.8303571428571428
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.67857143 0.9375 0.92857143 0.85714286 0.92857143 0.92857143
1. 0.57142857 0.78571429 0.64285714]
mean value: 0.8258928571428572
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.54545455 0.875 0.875 0.75 0.85714286 0.85714286
1. 0.4 0.66666667 0.44444444]
mean value: 0.7270851370851371
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.28
Accuracy on Blind test: 0.64
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00677133 0.00666976 0.00663257 0.00673056 0.00663328 0.00662422
0.00719857 0.00701356 0.00682855 0.00671649]
mean value: 0.006781888008117676
key: score_time
value: [0.00769711 0.00769949 0.00777817 0.00801635 0.00804901 0.00826836
0.00769639 0.00777459 0.00769544 0.00789642]
mean value: 0.007857131958007812
key: test_mcc
value: [-0.07142857 0.33928571 0.28867513 0.28867513 0.42857143 0.
0.4472136 0.1490712 0. 0. ]
mean value: 0.1870063634618141
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.46666667 0.66666667 0.64285714 0.64285714 0.71428571 0.5
0.71428571 0.57142857 0.5 0.5 ]
mean value: 0.5919047619047619
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.42857143 0.66666667 0.61538462 0.66666667 0.71428571 0.46153846
0.75 0.5 0.58823529 0.36363636]
mean value: 0.5754985210867564
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.42857143 0.71428571 0.66666667 0.625 0.71428571 0.5
0.66666667 0.6 0.5 0.5 ]
mean value: 0.591547619047619
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.42857143 0.625 0.57142857 0.71428571 0.71428571 0.42857143
0.85714286 0.42857143 0.71428571 0.28571429]
mean value: 0.5767857142857142
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.46428571 0.66964286 0.64285714 0.64285714 0.71428571 0.5
0.71428571 0.57142857 0.5 0.5 ]
mean value: 0.5919642857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.27272727 0.5 0.44444444 0.5 0.55555556 0.3
0.6 0.33333333 0.41666667 0.22222222]
mean value: 0.41449494949494947
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.24
Accuracy on Blind test: 0.62
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.03270388 1.02527833 1.038306 1.02501392 1.02789044 1.00910234
1.033144 1.01801777 1.01170015 1.01512265]
mean value: 1.0236279487609863
key: score_time
value: [0.09661889 0.0889287 0.09118915 0.0894289 0.09080982 0.08694053
0.09180784 0.09045506 0.08735704 0.09318399]
mean value: 0.09067199230194092
key: test_mcc
value: [0.56407607 0.875 1. 0.71428571 0.71428571 1.
1. 0. 0.8660254 0.57735027]
mean value: 0.7311023176363259
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73333333 0.93333333 1. 0.85714286 0.85714286 1.
1. 0.5 0.92857143 0.78571429]
mean value: 0.8595238095238095
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.77777778 0.93333333 1. 0.85714286 0.85714286 1.
1. 0.53333333 0.93333333 0.8 ]
mean value: 0.8692063492063492
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.63636364 1. 1. 0.85714286 0.85714286 1.
1. 0.5 0.875 0.75 ]
mean value: 0.847564935064935
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.875 1. 0.85714286 0.85714286 1.
1. 0.57142857 1. 0.85714286]
mean value: 0.9017857142857143
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.9375 1. 0.85714286 0.85714286 1.
1. 0.5 0.92857143 0.78571429]
mean value: 0.8616071428571429
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.63636364 0.875 1. 0.75 0.75 1.
1. 0.36363636 0.875 0.66666667]
mean value: 0.7916666666666666
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.59
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.85148787 0.9046824 0.85188794 0.83856058 0.89779782 0.83885241
0.90653539 0.82567453 0.8553915 0.97453666]
mean value: 0.8745407104492188
key: score_time
value: [0.34471321 0.19580126 0.22999787 0.1582098 0.21987033 0.21252227
0.18508577 0.23894954 0.15599632 0.22014427]
mean value: 0.21612906455993652
key: test_mcc
value: [ 0.56407607 0.875 0.74535599 0.71428571 0.57735027 0.8660254
0.8660254 -0.14285714 0.71428571 0.42857143]
mean value: 0.6208118858361914
key: train_mcc
value: [0.93745372 0.93748452 0.92288947 0.92288947 0.9379581 0.95417386
0.95324137 0.93933644 0.90802522 0.93933644]
mean value: 0.9352788622064109
key: test_accuracy
value: [0.73333333 0.93333333 0.85714286 0.85714286 0.78571429 0.92857143
0.92857143 0.42857143 0.85714286 0.71428571]
mean value: 0.8023809523809524
key: train_accuracy
value: [0.96850394 0.96850394 0.9609375 0.9609375 0.96875 0.9765625
0.9765625 0.96875 0.953125 0.96875 ]
mean value: 0.9671382874015748
key: test_fscore
value: [0.77777778 0.93333333 0.875 0.85714286 0.76923077 0.92307692
0.92307692 0.42857143 0.85714286 0.71428571]
mean value: 0.8058638583638583
key: train_fscore
value: [0.96923077 0.96875 0.96183206 0.96183206 0.96923077 0.97709924
0.97674419 0.96969697 0.95454545 0.96969697]
mean value: 0.967865847722607
key: test_precision
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[0.63636364 1. 0.77777778 0.85714286 0.83333333 1.
1. 0.42857143 0.85714286 0.71428571]
mean value: 0.8104617604617604
key: train_precision
value: [0.95454545 0.95384615 0.94029851 0.94029851 0.95454545 0.95522388
0.96923077 0.94117647 0.92647059 0.94117647]
mean value: 0.9476812257101985
key: test_recall
value: [1. 0.875 1. 0.85714286 0.71428571 0.85714286
0.85714286 0.42857143 0.85714286 0.71428571]
mean value: 0.8160714285714286
key: train_recall
value: [0.984375 0.98412698 0.984375 0.984375 0.984375 1.
0.984375 1. 0.984375 1. ]
mean value: 0.9890376984126984
key: test_roc_auc
value: [0.75 0.9375 0.85714286 0.85714286 0.78571429 0.92857143
0.92857143 0.42857143 0.85714286 0.71428571]
mean value: 0.8044642857142857
key: train_roc_auc
value: [0.96837798 0.96862599 0.9609375 0.9609375 0.96875 0.9765625
0.9765625 0.96875 0.953125 0.96875 ]
mean value: 0.9671378968253969
key: test_jcc
value: [0.63636364 0.875 0.77777778 0.75 0.625 0.85714286
0.85714286 0.27272727 0.75 0.55555556]
mean value: 0.6956709956709957
key: train_jcc
value: [0.94029851 0.93939394 0.92647059 0.92647059 0.94029851 0.95522388
0.95454545 0.94117647 0.91304348 0.94117647]
mean value: 0.9378097885369711
MCC on Blind test: 0.29
Accuracy on Blind test: 0.63
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01670504 0.00717616 0.00681663 0.00686646 0.00743771 0.00687003
0.00733042 0.00698042 0.00724292 0.00684094]
mean value: 0.008026671409606934
key: score_time
value: [0.0158987 0.00824714 0.00798559 0.0079875 0.00846529 0.00800943
0.00837207 0.00800943 0.00845408 0.00799131]
mean value: 0.008942055702209472
key: test_mcc
value: [-0.19642857 0.47245559 0.63245553 -0.14285714 0. 0.4472136
0.4472136 0. 0.28867513 0.1490712 ]
mean value: 0.20977989331042105
key: train_mcc
value: [0.35590281 0.40535457 0.36154406 0.34995662 0.43771378 0.36480373
0.40704579 0.34391797 0.37665889 0.375 ]
mean value: 0.37778982176154485
key: test_accuracy
value: [0.4 0.73333333 0.78571429 0.42857143 0.5 0.71428571
0.71428571 0.5 0.64285714 0.57142857]
mean value: 0.599047619047619
key: train_accuracy
value: [0.67716535 0.7007874 0.6796875 0.671875 0.71875 0.6796875
0.703125 0.671875 0.6875 0.6875 ]
mean value: 0.6877952755905512
key: test_fscore
value: [0.4 0.77777778 0.82352941 0.42857143 0.53333333 0.66666667
0.75 0.53333333 0.66666667 0.5 ]
mean value: 0.6079878618113912
key: train_fscore
value: [0.6962963 0.71641791 0.6962963 0.7 0.71428571 0.70503597
0.71212121 0.67692308 0.70149254 0.6875 ]
mean value: 0.7006369014906811
key: test_precision
value: [0.375 0.7 0.7 0.42857143 0.5 0.8
0.66666667 0.5 0.625 0.6 ]
mean value: 0.5895238095238096
key: train_precision
value: [0.66197183 0.67605634 0.66197183 0.64473684 0.72580645 0.65333333
0.69117647 0.66666667 0.67142857 0.6875 ]
mean value: 0.6740648335734973
key: test_recall
value: [0.42857143 0.875 1. 0.42857143 0.57142857 0.57142857
0.85714286 0.57142857 0.71428571 0.42857143]
mean value: 0.6446428571428571
key: train_recall
value: [0.734375 0.76190476 0.734375 0.765625 0.703125 0.765625
0.734375 0.6875 0.734375 0.6875 ]
mean value: 0.7308779761904762
key: test_roc_auc
value: [0.40178571 0.72321429 0.78571429 0.42857143 0.5 0.71428571
0.71428571 0.5 0.64285714 0.57142857]
mean value: 0.5982142857142857
key: train_roc_auc
value: [0.67671131 0.70126488 0.6796875 0.671875 0.71875 0.6796875
0.703125 0.671875 0.6875 0.6875 ]
mean value: 0.687797619047619
key: test_jcc
value: [0.25 0.63636364 0.7 0.27272727 0.36363636 0.5
0.6 0.36363636 0.5 0.33333333]
mean value: 0.45196969696969697
key: train_jcc
value: [0.53409091 0.55813953 0.53409091 0.53846154 0.55555556 0.54444444
0.55294118 0.51162791 0.54022989 0.52380952]
mean value: 0.5393391383841405
MCC on Blind test: 0.39
Accuracy on Blind test: 0.69
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.06879473 0.03695536 0.03799057 0.03917074 0.03711796 0.03794122
0.04741096 0.0387218 0.04068446 0.03751087]
mean value: 0.042229866981506346
key: score_time
value: [0.00955296 0.00969815 0.0103898 0.0105443 0.01052046 0.01191258
0.01033878 0.01035762 0.01077509 0.0104599 ]
mean value: 0.010454964637756348
key: test_mcc
value: [0.66143783 1. 1. 0.8660254 0.71428571 1.
1. 0.57735027 0.8660254 0.71428571]
mean value: 0.8399410333096079
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8 1. 1. 0.92857143 0.85714286 1.
1. 0.78571429 0.92857143 0.85714286]
mean value: 0.9157142857142857
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 1. 1. 0.92307692 0.85714286 1.
1. 0.76923077 0.93333333 0.85714286]
mean value: 0.9163456151691446
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7 1. 1. 1. 0.85714286 1.
1. 0.83333333 0.875 0.85714286]
mean value: 0.9122619047619047
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.85714286 0.85714286 1.
1. 0.71428571 1. 0.85714286]
mean value: 0.9285714285714286
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 1. 1. 0.92857143 0.85714286 1.
1. 0.78571429 0.92857143 0.85714286]
mean value: 0.9169642857142858
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 1. 1. 0.85714286 0.75 1.
1. 0.625 0.875 0.75 ]
mean value: 0.8557142857142856
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.04
Accuracy on Blind test: 0.48
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01021028 0.01137257 0.01115346 0.01160097 0.01162314 0.01166773
0.01157808 0.01191115 0.01153421 0.01156759]
mean value: 0.011421918869018555
key: score_time
value: [0.01013207 0.01012731 0.01019645 0.01036048 0.01039767 0.01046324
0.01033401 0.01034856 0.01044607 0.01029301]
mean value: 0.010309886932373048
key: test_mcc
value: [0.66143783 0.46428571 0.8660254 0.74535599 0.63245553 0.57735027
0.8660254 0.71428571 0.28867513 0.4472136 ]
mean value: 0.6263110587724456
key: train_mcc
value: [0.93745372 0.90550595 0.92198755 0.9375 0.92198755 0.95324137
0.95417386 0.9379581 0.95324137 0.90669283]
mean value: 0.9329742313821212
key: test_accuracy
value: [0.8 0.73333333 0.92857143 0.85714286 0.78571429 0.78571429
0.92857143 0.85714286 0.64285714 0.71428571]
mean value: 0.8033333333333333
key: train_accuracy
value: [0.96850394 0.95275591 0.9609375 0.96875 0.9609375 0.9765625
0.9765625 0.96875 0.9765625 0.953125 ]
mean value: 0.9663447342519685
key: test_fscore
value: [0.82352941 0.75 0.92307692 0.83333333 0.72727273 0.76923077
0.92307692 0.85714286 0.66666667 0.66666667]
mean value: 0.7939996278231571
key: train_fscore
value: [0.96923077 0.95238095 0.96062992 0.96875 0.96062992 0.97674419
0.97709924 0.96923077 0.97674419 0.95384615]
mean value: 0.9665286095942575
key: test_precision
value: [0.7 0.75 1. 1. 1. 0.83333333
1. 0.85714286 0.625 0.8 ]
mean value: 0.856547619047619
key: train_precision
value: [0.95454545 0.95238095 0.96825397 0.96875 0.96825397 0.96923077
0.95522388 0.95454545 0.96923077 0.93939394]
mean value: 0.9599809156432291
key: test_recall
value: [1. 0.75 0.85714286 0.71428571 0.57142857 0.71428571
0.85714286 0.85714286 0.71428571 0.57142857]
mean value: 0.7607142857142857
key: train_recall
value: [0.984375 0.95238095 0.953125 0.96875 0.953125 0.984375
1. 0.984375 0.984375 0.96875 ]
mean value: 0.9733630952380953
key: test_roc_auc
value: [0.8125 0.73214286 0.92857143 0.85714286 0.78571429 0.78571429
0.92857143 0.85714286 0.64285714 0.71428571]
mean value: 0.8044642857142857
key: train_roc_auc
value: [0.96837798 0.95275298 0.9609375 0.96875 0.9609375 0.9765625
0.9765625 0.96875 0.9765625 0.953125 ]
mean value: 0.9663318452380952
key: test_jcc
value: [0.7 0.6 0.85714286 0.71428571 0.57142857 0.625
0.85714286 0.75 0.5 0.5 ]
mean value: 0.6675
key: train_jcc
value: [0.94029851 0.90909091 0.92424242 0.93939394 0.92424242 0.95454545
0.95522388 0.94029851 0.95454545 0.91176471]
mean value: 0.9353646207465347
MCC on Blind test: 0.02
Accuracy on Blind test: 0.51
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01899028 0.00702119 0.00694108 0.00665832 0.00675297 0.00682497
0.00661945 0.00672174 0.00678945 0.00670505]
mean value: 0.008002448081970214
key: score_time
value: [0.01023316 0.00807333 0.0078764 0.00824809 0.00777531 0.00779223
0.0078361 0.00779271 0.00800109 0.00779223]
mean value: 0.00814206600189209
key: test_mcc
value: [ 0.56407607 0.6000992 0.40824829 0.14285714 0. 0.4472136
0.28867513 -0.1490712 0. -0.1490712 ]
mean value: 0.21530270393825499
key: train_mcc
value: [0.40417056 0.34191645 0.4113018 0.47245559 0.39067269 0.34646743
0.29691125 0.43943537 0.4429404 0.39105486]
mean value: 0.39373264045213846
key: test_accuracy
value: [0.73333333 0.8 0.64285714 0.57142857 0.5 0.71428571
0.64285714 0.42857143 0.5 0.42857143]
mean value: 0.5961904761904762
key: train_accuracy
value: [0.7007874 0.66929134 0.703125 0.734375 0.6953125 0.671875
0.6484375 0.71875 0.71875 0.6953125 ]
mean value: 0.695601624015748
key: test_fscore
value: [0.77777778 0.82352941 0.73684211 0.57142857 0.58823529 0.66666667
0.66666667 0.33333333 0.53333333 0.33333333]
mean value: 0.6031146493685193
key: train_fscore
value: [0.72058824 0.68656716 0.72463768 0.75 0.69767442 0.69117647
0.65116279 0.73134328 0.73913043 0.70229008]
mean value: 0.7094570555223779
key: test_precision
value: [0.63636364 0.77777778 0.58333333 0.57142857 0.5 0.8
0.625 0.4 0.5 0.4 ]
mean value: 0.579390331890332
key: train_precision
value: [0.68055556 0.64788732 0.67567568 0.70833333 0.69230769 0.65277778
0.64615385 0.7 0.68918919 0.68656716]
mean value: 0.6779447558115836
key: test_recall
value: [1. 0.875 1. 0.57142857 0.71428571 0.57142857
0.71428571 0.28571429 0.57142857 0.28571429]
mean value: 0.6589285714285714
key: train_recall
value: [0.765625 0.73015873 0.78125 0.796875 0.703125 0.734375
0.65625 0.765625 0.796875 0.71875 ]
mean value: 0.744890873015873
key: test_roc_auc
value: [0.75 0.79464286 0.64285714 0.57142857 0.5 0.71428571
0.64285714 0.42857143 0.5 0.42857143]
mean value: 0.5973214285714286
key: train_roc_auc
value: [0.70027282 0.66976687 0.703125 0.734375 0.6953125 0.671875
0.6484375 0.71875 0.71875 0.6953125 ]
mean value: 0.6955977182539682
key: test_jcc
value: [0.63636364 0.7 0.58333333 0.4 0.41666667 0.5
0.5 0.2 0.36363636 0.2 ]
mean value: 0.45
key: train_jcc
value: [0.56321839 0.52272727 0.56818182 0.6 0.53571429 0.52808989
0.48275862 0.57647059 0.5862069 0.54117647]
mean value: 0.5504544231133333
MCC on Blind test: 0.38
Accuracy on Blind test: 0.69
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00755644 0.00730038 0.00778127 0.00782037 0.00711942 0.00752926
0.00717807 0.0077436 0.00730443 0.00738955]
mean value: 0.00747227668762207
key: score_time
value: [0.00840449 0.00777626 0.00787759 0.00786304 0.00776672 0.00783229
0.00777388 0.00788474 0.00788546 0.00791168]
mean value: 0.007897615432739258
key: test_mcc
value: [0.56407607 0.60714286 0.57735027 0.40824829 0.40824829 0.1490712
1. 0.28867513 0.57735027 0.31622777]
mean value: 0.48963901503792373
key: train_mcc
value: [0.69592496 0.84250992 0.92198755 0.72374686 0.60141677 0.78756153
0.62554324 0.84375 0.8226036 0.7617394 ]
mean value: 0.7626783838444016
key: test_accuracy
value: [0.73333333 0.8 0.78571429 0.64285714 0.64285714 0.57142857
1. 0.64285714 0.78571429 0.64285714]
mean value: 0.7247619047619047
key: train_accuracy
value: [0.82677165 0.92125984 0.9609375 0.84375 0.765625 0.8828125
0.78125 0.921875 0.90625 0.8671875 ]
mean value: 0.8677718996062992
key: test_fscore
value: [0.77777778 0.8 0.76923077 0.44444444 0.44444444 0.625
1. 0.61538462 0.76923077 0.70588235]
mean value: 0.6951395173453997
key: train_fscore
value: [0.85333333 0.92063492 0.96124031 0.81481481 0.69387755 0.8951049
0.82051282 0.921875 0.89830508 0.88275862]
mean value: 0.866245735093413
key: test_precision
value: [0.63636364 0.85714286 0.83333333 1. 1. 0.55555556
1. 0.66666667 0.83333333 0.6 ]
mean value: 0.7982395382395382
key: train_precision
value: [0.74418605 0.92063492 0.95384615 1. 1. 0.81012658
0.69565217 0.921875 0.98148148 0.79012346]
mean value: 0.8817925815455832
key: test_recall
value: [1. 0.75 0.71428571 0.28571429 0.28571429 0.71428571
1. 0.57142857 0.71428571 0.85714286]
mean value: 0.6892857142857143
key: train_recall
value: [1. 0.92063492 0.96875 0.6875 0.53125 1.
1. 0.921875 0.828125 1. ]
mean value: 0.885813492063492
key: test_roc_auc
value: [0.75 0.80357143 0.78571429 0.64285714 0.64285714 0.57142857
1. 0.64285714 0.78571429 0.64285714]
mean value: 0.7267857142857143
key: train_roc_auc
value: [0.82539683 0.92125496 0.9609375 0.84375 0.765625 0.8828125
0.78125 0.921875 0.90625 0.8671875 ]
mean value: 0.8676339285714285
key: test_jcc
value: [0.63636364 0.66666667 0.625 0.28571429 0.28571429 0.45454545
1. 0.44444444 0.625 0.54545455]
mean value: 0.5568903318903319
key: train_jcc
value: [0.74418605 0.85294118 0.92537313 0.6875 0.53125 0.81012658
0.69565217 0.85507246 0.81538462 0.79012346]
mean value: 0.7707609649444953
MCC on Blind test: 0.28
Accuracy on Blind test: 0.63
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00977015 0.00938821 0.00748348 0.00760007 0.00707746 0.00716782
0.00706244 0.00752711 0.00705504 0.00710464]
mean value: 0.0077236413955688475
key: score_time
value: [0.01031303 0.00984597 0.00794601 0.00786161 0.00812817 0.00785327
0.00777006 0.008039 0.0077517 0.0078032 ]
mean value: 0.008331203460693359
key: test_mcc
value: [0.46770717 0.76376262 0.57735027 0.57735027 0.52223297 0.74535599
0.8660254 0.40824829 0.57735027 0.4472136 ]
mean value: 0.5952596846856876
key: train_mcc
value: [0.70849191 0.58496906 0.8138413 0.8542422 0.63764677 0.60141677
0.81409158 0.72374686 0.72932496 0.90669283]
mean value: 0.7374464237869991
key: test_accuracy
value: [0.66666667 0.86666667 0.78571429 0.78571429 0.71428571 0.85714286
0.92857143 0.64285714 0.78571429 0.71428571]
mean value: 0.7747619047619048
key: train_accuracy
value: [0.83464567 0.75590551 0.8984375 0.921875 0.7890625 0.765625
0.90625 0.84375 0.8515625 0.953125 ]
mean value: 0.8520238681102362
key: test_fscore
value: [0.73684211 0.85714286 0.8 0.76923077 0.6 0.83333333
0.93333333 0.73684211 0.76923077 0.75 ]
mean value: 0.7785955272797378
key: train_fscore
value: [0.8590604 0.67368421 0.90780142 0.92753623 0.73267327 0.69387755
0.90909091 0.86486486 0.82882883 0.95384615]
mean value: 0.8351263838512551
key: test_precision
value: [0.58333333 1. 0.75 0.83333333 1. 1.
0.875 0.58333333 0.83333333 0.66666667]
mean value: 0.8125
key: train_precision
value: [0.75294118 1. 0.83116883 0.86486486 1. 1.
0.88235294 0.76190476 0.9787234 0.93939394]
mean value: 0.9011349919234776
key: test_recall
value: [1. 0.75 0.85714286 0.71428571 0.42857143 0.71428571
1. 1. 0.71428571 0.85714286]
mean value: 0.8035714285714286
key: train_recall
value: [1. 0.50793651 1. 1. 0.578125 0.53125
0.9375 1. 0.71875 0.96875 ]
mean value: 0.8242311507936508
key: test_roc_auc
value: [0.6875 0.875 0.78571429 0.78571429 0.71428571 0.85714286
0.92857143 0.64285714 0.78571429 0.71428571]
mean value: 0.7776785714285714
key: train_roc_auc
value: [0.83333333 0.75396825 0.8984375 0.921875 0.7890625 0.765625
0.90625 0.84375 0.8515625 0.953125 ]
mean value: 0.8516989087301587
key: test_jcc
value: [0.58333333 0.75 0.66666667 0.625 0.42857143 0.71428571
0.875 0.58333333 0.625 0.6 ]
mean value: 0.6451190476190476
key: train_jcc
value: [0.75294118 0.50793651 0.83116883 0.86486486 0.578125 0.53125
0.83333333 0.76190476 0.70769231 0.91176471]
mean value: 0.7280981489253548
MCC on Blind test: 0.16
Accuracy on Blind test: 0.57
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.0766685 0.06215739 0.06248546 0.06247211 0.06284809 0.0624218
0.06243992 0.06250739 0.06269693 0.06271696]
mean value: 0.06394145488739014
key: score_time
value: [0.01422691 0.0138514 0.01398492 0.01401758 0.01408148 0.01418185
0.01409602 0.01414037 0.01412201 0.01409864]
mean value: 0.014080119132995606
key: test_mcc
value: [0.66143783 1. 0.8660254 1. 0.8660254 0.8660254
1. 0.28867513 0.71428571 0.57735027]
mean value: 0.7839825157189617
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8 1. 0.92857143 1. 0.92857143 0.92857143
1. 0.64285714 0.85714286 0.78571429]
mean value: 0.8871428571428571
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 1. 0.92307692 1. 0.93333333 0.93333333
1. 0.66666667 0.85714286 0.76923077]
mean value: 0.8906313294548589
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7 1. 1. 1. 0.875 0.875
1. 0.625 0.85714286 0.83333333]
mean value: 0.876547619047619
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.85714286 1. 1. 1.
1. 0.71428571 0.85714286 0.71428571]
mean value: 0.9142857142857143
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 1. 0.92857143 1. 0.92857143 0.92857143
1. 0.64285714 0.85714286 0.78571429]
mean value: 0.8883928571428572
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 1. 0.85714286 1. 0.875 0.875
1. 0.5 0.75 0.625 ]
mean value: 0.8182142857142857
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: -0.06
Accuracy on Blind test: 0.47
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.02604771 0.02509189 0.03977823 0.03567004 0.04593778 0.0436461
0.04469895 0.03649855 0.02204108 0.02417731]
mean value: 0.034358763694763185
key: score_time
value: [0.01960158 0.01544213 0.03531432 0.02577138 0.03630829 0.03665662
0.03008604 0.02398133 0.01645422 0.02618575]
mean value: 0.026580166816711426
key: test_mcc
value: [0.66143783 0.875 1. 0.8660254 0.4472136 0.71428571
1. 0.31622777 0.71428571 0.8660254 ]
mean value: 0.7460501425423249
key: train_mcc
value: [1. 0.9689752 0.96922337 1. 0.95324137 1.
1. 1. 1. 0.98449518]
mean value: 0.9875935124101023
key: test_accuracy
value: [0.8 0.93333333 1. 0.92857143 0.71428571 0.85714286
1. 0.64285714 0.85714286 0.92857143]
mean value: 0.8661904761904762
key: train_accuracy
value: [1. 0.98425197 0.984375 1. 0.9765625 1.
1. 1. 1. 0.9921875 ]
mean value: 0.9937376968503937
key: test_fscore
value: [0.82352941 0.93333333 1. 0.92307692 0.66666667 0.85714286
1. 0.70588235 0.85714286 0.92307692]
mean value: 0.8689851325145442
key: train_fscore
value: [1. 0.98387097 0.98412698 1. 0.97637795 1.
1. 1. 1. 0.99212598]
mean value: 0.9936501888876793
key: test_precision
value: [0.7 1. 1. 1. 0.8 0.85714286
1. 0.6 0.85714286 1. ]
mean value: 0.8814285714285715
key: train_precision
value: [1. 1. 1. 1. 0.98412698 1.
1. 1. 1. 1. ]
mean value: 0.9984126984126984
key: test_recall
value: [1. 0.875 1. 0.85714286 0.57142857 0.85714286
1. 0.85714286 0.85714286 0.85714286]
mean value: 0.8732142857142857
key: train_recall
value: [1. 0.96825397 0.96875 1. 0.96875 1.
1. 1. 1. 0.984375 ]
mean value: 0.9890128968253968
key: test_roc_auc
value: [0.8125 0.9375 1. 0.92857143 0.71428571 0.85714286
1. 0.64285714 0.85714286 0.92857143]
mean value: 0.8678571428571429
key: train_roc_auc
value: [1. 0.98412698 0.984375 1. 0.9765625 1.
1. 1. 1. 0.9921875 ]
mean value: 0.9937251984126985
key: test_jcc
value: [0.7 0.875 1. 0.85714286 0.5 0.75
1. 0.54545455 0.75 0.85714286]
mean value: 0.7834740259740259
key: train_jcc
value: [1. 0.96825397 0.96875 1. 0.95384615 1.
1. 1. 1. 0.984375 ]
mean value: 0.9875225122100122
MCC on Blind test: 0.07
Accuracy on Blind test: 0.53
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.029562 0.03599644 0.0356493 0.03583241 0.035743 0.01903772
0.02019906 0.02188826 0.0147891 0.04324818]
mean value: 0.02919454574584961
key: score_time
value: [0.02010798 0.01939225 0.01914334 0.01897407 0.01091695 0.01093078
0.01089573 0.01088786 0.01081705 0.01087356]
mean value: 0.014293956756591796
key: test_mcc
value: [ 0.60714286 0.46428571 0.14285714 0.28867513 -0.1490712 0.4472136
0.28867513 0.14285714 0. 0. ]
mean value: 0.22326355233324546
key: train_mcc
value: [0.95287698 0.96850198 0.95324137 0.96922337 0.9379581 0.98449518
0.92198755 0.95417386 0.95417386 0.96875 ]
mean value: 0.9565382271774051
key: test_accuracy
value: [0.8 0.73333333 0.57142857 0.64285714 0.42857143 0.71428571
0.64285714 0.57142857 0.5 0.5 ]
mean value: 0.6104761904761905
key: train_accuracy
value: [0.97637795 0.98425197 0.9765625 0.984375 0.96875 0.9921875
0.9609375 0.9765625 0.9765625 0.984375 ]
mean value: 0.9780942421259843
key: test_fscore
value: [0.8 0.75 0.57142857 0.61538462 0.33333333 0.66666667
0.61538462 0.57142857 0.46153846 0.36363636]
mean value: 0.5748801198801199
key: train_fscore
value: [0.97637795 0.98412698 0.97637795 0.98412698 0.96825397 0.99224806
0.96062992 0.976 0.976 0.984375 ]
mean value: 0.9778516825295094
key: test_precision
value: [0.75 0.75 0.57142857 0.66666667 0.4 0.8
0.66666667 0.57142857 0.5 0.5 ]
mean value: 0.6176190476190476
key: train_precision
value: [0.98412698 0.98412698 0.98412698 1. 0.98387097 0.98461538
0.96825397 1. 1. 0.984375 ]
mean value: 0.987349627299224
key: test_recall
value: [0.85714286 0.75 0.57142857 0.57142857 0.28571429 0.57142857
0.57142857 0.57142857 0.42857143 0.28571429]
mean value: 0.5464285714285714
key: train_recall
value: [0.96875 0.98412698 0.96875 0.96875 0.953125 1.
0.953125 0.953125 0.953125 0.984375 ]
mean value: 0.9687251984126984
key: test_roc_auc
value: [0.80357143 0.73214286 0.57142857 0.64285714 0.42857143 0.71428571
0.64285714 0.57142857 0.5 0.5 ]
mean value: 0.6107142857142857
key: train_roc_auc
value: [0.97643849 0.98425099 0.9765625 0.984375 0.96875 0.9921875
0.9609375 0.9765625 0.9765625 0.984375 ]
mean value: 0.9781001984126985
key: test_jcc
value: [0.66666667 0.6 0.4 0.44444444 0.2 0.5
0.44444444 0.4 0.3 0.22222222]
mean value: 0.41777777777777775
key: train_jcc
value: [0.95384615 0.96875 0.95384615 0.96875 0.93846154 0.98461538
0.92424242 0.953125 0.953125 0.96923077]
mean value: 0.9567992424242424
MCC on Blind test: 0.29
Accuracy on Blind test: 0.64
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.1164434 0.10044718 0.10379887 0.09790683 0.10323715 0.10379982
0.10525608 0.10259223 0.1029501 0.10186577]
mean value: 0.10382974147796631
key: score_time
value: [0.00966978 0.00841308 0.00894523 0.00920153 0.00936794 0.00911546
0.00893807 0.00919414 0.00939775 0.00913119]
mean value: 0.009137415885925293
key: test_mcc
value: [0.66143783 1. 1. 0.8660254 0.8660254 0.8660254
1. 0.4472136 0.8660254 0.42857143]
mean value: 0.8001324466975289
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.8 1. 1. 0.92857143 0.92857143 0.92857143
1. 0.71428571 0.92857143 0.71428571]
mean value: 0.8942857142857144
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 1. 1. 0.92307692 0.93333333 0.93333333
1. 0.75 0.93333333 0.71428571]
mean value: 0.9010892049127344
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7 1. 1. 1. 0.875 0.875
1. 0.66666667 0.875 0.71428571]
mean value: 0.8705952380952381
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 0.85714286 1. 1.
1. 0.85714286 1. 0.71428571]
mean value: 0.9428571428571428
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8125 1. 1. 0.92857143 0.92857143 0.92857143
1. 0.71428571 0.92857143 0.71428571]
mean value: 0.8955357142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 1. 1. 0.85714286 0.875 0.875
1. 0.6 0.875 0.55555556]
mean value: 0.8337698412698412
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.04
Accuracy on Blind test: 0.51
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00956202 0.01067328 0.01087284 0.01171899 0.0111239 0.01345611
0.01117897 0.01136255 0.011199 0.01894307]
mean value: 0.012009072303771972
key: score_time
value: [0.01031303 0.01022744 0.01023555 0.01084542 0.01065707 0.01072264
0.01063037 0.01066041 0.0107913 0.01101661]
mean value: 0.010609984397888184
key: test_mcc
value: [0.66143783 0.18898224 0.40824829 0.52223297 0.4472136 0.28867513
0.52223297 0.17407766 0.2773501 0.17407766]
mean value: 0.36645284305875925
key: train_mcc
value: [0.70849191 0.59989919 0.64978629 0.57735027 0.76571848 0.73658951
0.58937969 0.7617394 0.71641857 0.71125407]
mean value: 0.6816627369382001
key: test_accuracy
value: [0.8 0.6 0.64285714 0.71428571 0.71428571 0.64285714
0.71428571 0.57142857 0.57142857 0.57142857]
mean value: 0.6542857142857142
key: train_accuracy
value: [0.83464567 0.76377953 0.796875 0.75 0.8828125 0.859375
0.7578125 0.8671875 0.84375 0.8359375 ]
mean value: 0.8192175196850393
key: test_fscore
value: [0.82352941 0.66666667 0.73684211 0.77777778 0.66666667 0.61538462
0.77777778 0.66666667 0.7 0.66666667]
mean value: 0.7097978354634701
key: train_fscore
value: [0.8590604 0.80769231 0.83116883 0.8 0.88188976 0.87323944
0.80503145 0.88275862 0.8630137 0.8590604 ]
mean value: 0.8462914910490185
key: test_precision
value: [0.7 0.6 0.58333333 0.63636364 0.8 0.66666667
0.63636364 0.54545455 0.53846154 0.54545455]
mean value: 0.6252097902097902
key: train_precision
value: [0.75294118 0.67741935 0.71111111 0.66666667 0.88888889 0.79487179
0.67368421 0.79012346 0.76829268 0.75294118]
mean value: 0.7476940519561616
key: test_recall
value: [1. 0.75 1. 1. 0.57142857 0.57142857
1. 0.85714286 1. 0.85714286]
mean value: 0.8607142857142857
key: train_recall
value: [1. 1. 1. 1. 0.875 0.96875 1. 1.
0.984375 1. ]
mean value: 0.9828125
key: test_roc_auc
value: [0.8125 0.58928571 0.64285714 0.71428571 0.71428571 0.64285714
0.71428571 0.57142857 0.57142857 0.57142857]
mean value: 0.6544642857142857
key: train_roc_auc
value: [0.83333333 0.765625 0.796875 0.75 0.8828125 0.859375
0.7578125 0.8671875 0.84375 0.8359375 ]
mean value: 0.8192708333333334
key: test_jcc
value: [0.7 0.5 0.58333333 0.63636364 0.5 0.44444444
0.63636364 0.5 0.53846154 0.5 ]
mean value: 0.5538966588966588
key: train_jcc
value: [0.75294118 0.67741935 0.71111111 0.66666667 0.78873239 0.775
0.67368421 0.79012346 0.75903614 0.75294118]
mean value: 0.7347655691818613
MCC on Blind test: 0.27
Accuracy on Blind test: 0.64
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01034141 0.01005673 0.00851965 0.00838447 0.00835919 0.00818896
0.00820589 0.00816202 0.00818706 0.00815248]
mean value: 0.008655786514282227
key: score_time
value: [0.01045799 0.00901937 0.00879979 0.00867701 0.00855756 0.00852752
0.00862861 0.00858521 0.00861907 0.00860906]
mean value: 0.008848118782043456
key: test_mcc
value: [0.66143783 0.76376262 0.8660254 0.63245553 0.74535599 0.74535599
1. 0.42857143 0.74535599 0.42857143]
mean value: 0.7016892214052882
key: train_mcc
value: [0.87447286 0.88988095 0.85947992 0.875 0.87542756 0.90669283
0.87542756 0.84375 0.89073374 0.89073374]
mean value: 0.8781599163560809
key: test_accuracy
value: [0.8 0.86666667 0.92857143 0.78571429 0.85714286 0.85714286
1. 0.71428571 0.85714286 0.71428571]
mean value: 0.8380952380952381
key: train_accuracy
value: [0.93700787 0.94488189 0.9296875 0.9375 0.9375 0.953125
0.9375 0.921875 0.9453125 0.9453125 ]
mean value: 0.9389702263779528
key: test_fscore
value: [0.82352941 0.85714286 0.92307692 0.72727273 0.83333333 0.83333333
1. 0.71428571 0.875 0.71428571]
mean value: 0.8301260014495309
key: train_fscore
value: [0.93650794 0.94488189 0.92913386 0.9375 0.93846154 0.95384615
0.93846154 0.921875 0.94573643 0.94573643]
mean value: 0.9392140783525718
key: test_precision
value: [0.7 1. 1. 1. 1. 1.
1. 0.71428571 0.77777778 0.71428571]
mean value: 0.8906349206349207
key: train_precision
value: [0.9516129 0.9375 0.93650794 0.9375 0.92424242 0.93939394
0.92424242 0.921875 0.93846154 0.93846154]
mean value: 0.9349797704535607
key: test_recall
value: [1. 0.75 0.85714286 0.57142857 0.71428571 0.71428571
1. 0.71428571 1. 0.71428571]
mean value: 0.8035714285714286
key: train_recall
value: [0.921875 0.95238095 0.921875 0.9375 0.953125 0.96875
0.953125 0.921875 0.953125 0.953125 ]
mean value: 0.9436755952380952
key: test_roc_auc
value: [0.8125 0.875 0.92857143 0.78571429 0.85714286 0.85714286
1. 0.71428571 0.85714286 0.71428571]
mean value: 0.8401785714285714
key: train_roc_auc
value: [0.93712798 0.94494048 0.9296875 0.9375 0.9375 0.953125
0.9375 0.921875 0.9453125 0.9453125 ]
mean value: 0.9389880952380952
key: test_jcc
value: [0.7 0.75 0.85714286 0.57142857 0.71428571 0.71428571
1. 0.55555556 0.77777778 0.55555556]
mean value: 0.7196031746031746
key: train_jcc
value: [0.88059701 0.89552239 0.86764706 0.88235294 0.88405797 0.91176471
0.88405797 0.85507246 0.89705882 0.89705882]
mean value: 0.8855190161723353
MCC on Blind test: 0.21
Accuracy on Blind test: 0.6
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.07449079 0.06415701 0.06411934 0.06436348 0.06481528 0.06425691
0.06450534 0.06399703 0.0644269 0.06456614]
mean value: 0.0653698205947876
key: score_time
value: [0.00915241 0.00884914 0.0087738 0.00880289 0.00888062 0.00877213
0.00884771 0.00880075 0.00886726 0.00878549]
mean value: 0.00885322093963623
key: test_mcc
value: [0.66143783 0.76376262 0.8660254 0.63245553 0.63245553 0.74535599
1. 0.42857143 0.74535599 0.42857143]
mean value: 0.6903991753586628
key: train_mcc
value: [0.87447286 0.88988095 0.85947992 0.875 0.92198755 0.95417386
0.87542756 0.84375 0.89073374 0.89073374]
mean value: 0.887564019240932
key: test_accuracy
value: [0.8 0.86666667 0.92857143 0.78571429 0.78571429 0.85714286
1. 0.71428571 0.85714286 0.71428571]
mean value: 0.830952380952381
key: train_accuracy
value: [0.93700787 0.94488189 0.9296875 0.9375 0.9609375 0.9765625
0.9375 0.921875 0.9453125 0.9453125 ]
mean value: 0.9436577263779528
key: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
test_fscore
value: [0.82352941 0.85714286 0.92307692 0.72727273 0.72727273 0.83333333
1. 0.71428571 0.875 0.71428571]
mean value: 0.8195199408434702
key: train_fscore
value: [0.93650794 0.94488189 0.92913386 0.9375 0.96124031 0.97709924
0.93846154 0.921875 0.94573643 0.94573643]
mean value: 0.9438172637936766
key: test_precision
value: [0.7 1. 1. 1. 1. 1.
1. 0.71428571 0.77777778 0.71428571]
mean value: 0.8906349206349207
key: train_precision
value: [0.9516129 0.9375 0.93650794 0.9375 0.95384615 0.95522388
0.92424242 0.921875 0.93846154 0.93846154]
mean value: 0.9395231375342413
key: test_recall
value: [1. 0.75 0.85714286 0.57142857 0.57142857 0.71428571
1. 0.71428571 1. 0.71428571]
mean value: 0.7892857142857143
key: train_recall
value: [0.921875 0.95238095 0.921875 0.9375 0.96875 1.
0.953125 0.921875 0.953125 0.953125 ]
mean value: 0.9483630952380953
key: test_roc_auc
value: [0.8125 0.875 0.92857143 0.78571429 0.78571429 0.85714286
1. 0.71428571 0.85714286 0.71428571]
mean value: 0.8330357142857143
key: train_roc_auc
value: [0.93712798 0.94494048 0.9296875 0.9375 0.9609375 0.9765625
0.9375 0.921875 0.9453125 0.9453125 ]
mean value: 0.9436755952380952
key: test_jcc
value: [0.7 0.75 0.85714286 0.57142857 0.57142857 0.71428571
1. 0.55555556 0.77777778 0.55555556]
mean value: 0.7053174603174603
key: train_jcc
value: [0.88059701 0.89552239 0.86764706 0.88235294 0.92537313 0.95522388
0.88405797 0.85507246 0.89705882 0.89705882]
mean value: 0.893996449975188
MCC on Blind test: 0.08
Accuracy on Blind test: 0.54
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02492595 0.01881099 0.02069116 0.02081108 0.0185163 0.02902484
0.02047849 0.03224087 0.03128767 0.02006626]
mean value: 0.023685359954833986
key: score_time
value: [0.0105257 0.0105207 0.01081491 0.01050901 0.01055932 0.01063681
0.01051402 0.01091695 0.01072741 0.010885 ]
mean value: 0.010660982131958008
key: test_mcc
value: [0.48075018 0.56818182 0.56490196 0.47727273 0.91605722 0.74242424
0.83743579 0.82575758 0.91287093 0.54772256]
mean value: 0.6873374995187688
key: train_mcc
value: [0.82452636 0.74645342 0.7859188 0.7954287 0.73693234 0.78548989
0.75613935 0.78536075 0.76756932 0.79615403]
mean value: 0.777997295379433
key: test_accuracy
value: [0.73913043 0.7826087 0.7826087 0.73913043 0.95652174 0.86956522
0.91304348 0.91304348 0.95454545 0.77272727]
mean value: 0.8422924901185771
key: train_accuracy
value: [0.91219512 0.87317073 0.89268293 0.89756098 0.86829268 0.89268293
0.87804878 0.89268293 0.88349515 0.89805825]
mean value: 0.8888870471228985
key: test_fscore
value: [0.7 0.7826087 0.76190476 0.72727273 0.96 0.86956522
0.92307692 0.91666667 0.95652174 0.76190476]
mean value: 0.8359521492999754
key: train_fscore
value: [0.91346154 0.875 0.8952381 0.89952153 0.86956522 0.89108911
0.87804878 0.89215686 0.88571429 0.89855072]
mean value: 0.8898346144687178
key: test_precision
value: [0.77777778 0.75 0.8 0.72727273 0.92307692 0.90909091
0.85714286 0.91666667 0.91666667 0.8 ]
mean value: 0.8377694527694528
key: train_precision
value: [0.9047619 0.86666667 0.87850467 0.88679245 0.85714286 0.9
0.87378641 0.89215686 0.86915888 0.89423077]
mean value: 0.8823201472546344
key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.72727273 1. 0.83333333
1. 0.91666667 1. 0.72727273]
mean value: 0.8386363636363636
key: train_recall
value: [0.9223301 0.88349515 0.91262136 0.91262136 0.88235294 0.88235294
0.88235294 0.89215686 0.90291262 0.90291262]
mean value: 0.8976108890158006
key: test_roc_auc
value: [0.73484848 0.78409091 0.78030303 0.73863636 0.95454545 0.87121212
0.90909091 0.91287879 0.95454545 0.77272727]
mean value: 0.8412878787878788
key: train_roc_auc
value: [0.91214544 0.87312012 0.89258519 0.89748715 0.86836094 0.89263278
0.87806967 0.89268037 0.88349515 0.89805825]
mean value: 0.8888635065676757
key: test_jcc
value: [0.53846154 0.64285714 0.61538462 0.57142857 0.92307692 0.76923077
0.85714286 0.84615385 0.91666667 0.61538462]
mean value: 0.7295787545787545
key: train_jcc
value: [0.84070796 0.77777778 0.81034483 0.8173913 0.76923077 0.80357143
0.7826087 0.80530973 0.79487179 0.81578947]
mean value: 0.8017603770837232
MCC on Blind test: 0.35
Accuracy on Blind test: 0.67
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.64007044 0.59848499 0.60545135 0.81422853 0.62158179 0.65589404
0.78231335 0.678689 0.63352537 0.76259565]
mean value: 0.6792834520339965
key: score_time
value: [0.01374793 0.01383209 0.01083827 0.01381516 0.01120448 0.01409101
0.01422381 0.01409006 0.01428652 0.01422524]
mean value: 0.013435459136962891
key: test_mcc
value: [0.65909298 0.74242424 0.74047959 0.56818182 0.83971912 0.82575758
0.65151515 0.74242424 0.68313005 0.73029674]
mean value: 0.7183021519902297
key: train_mcc
value: [0.90310636 1. 0.88308106 1. 0.88292404 0.88361919
0.86358877 0.94146202 0.99033794 0.89358299]
mean value: 0.9241702374683562
key: test_accuracy
value: [0.82608696 0.86956522 0.86956522 0.7826087 0.91304348 0.91304348
0.82608696 0.86956522 0.81818182 0.86363636]
mean value: 0.8551383399209486
key: train_accuracy
value: [0.95121951 1. 0.94146341 1. 0.94146341 0.94146341
0.93170732 0.97073171 0.99514563 0.94660194]
mean value: 0.9619796353303338
key: test_fscore
value: [0.8 0.86956522 0.85714286 0.7826087 0.90909091 0.91666667
0.83333333 0.86956522 0.77777778 0.86956522]
mean value: 0.848531589183763
key: train_fscore
value: [0.95238095 1. 0.94230769 1. 0.94117647 0.94230769
0.93203883 0.97058824 0.99512195 0.94736842]
mean value: 0.962329025010229
key: test_precision
value: [0.88888889 0.83333333 0.9 0.75 1. 0.91666667
0.83333333 0.90909091 1. 0.83333333]
mean value: 0.8864646464646465
key: train_precision
value: [0.93457944 1. 0.93333333 1. 0.94117647 0.9245283
0.92307692 0.97058824 1. 0.93396226]
mean value: 0.9561244967582682
key: test_recall
value: [0.72727273 0.90909091 0.81818182 0.81818182 0.83333333 0.91666667
0.83333333 0.83333333 0.63636364 0.90909091]
mean value: 0.8234848484848485
key: train_recall
value: [0.97087379 1. 0.95145631 1. 0.94117647 0.96078431
0.94117647 0.97058824 0.99029126 0.96116505]
mean value: 0.9687511897963069
key: test_roc_auc
value: [0.8219697 0.87121212 0.86742424 0.78409091 0.91666667 0.91287879
0.82575758 0.87121212 0.81818182 0.86363636]
mean value: 0.8553030303030303
key: train_roc_auc
value: [0.95112317 1. 0.94141443 1. 0.94146202 0.94155721
0.93175328 0.97073101 0.99514563 0.94660194]
mean value: 0.96197886921759
key: test_jcc
value: [0.66666667 0.76923077 0.75 0.64285714 0.83333333 0.84615385
0.71428571 0.76923077 0.63636364 0.76923077]
mean value: 0.7397352647352647
key: train_jcc
value: [0.90909091 1. 0.89090909 1. 0.88888889 0.89090909
0.87272727 0.94285714 0.99029126 0.9 ]
mean value: 0.9285673657518317
MCC on Blind test: 0.2
Accuracy on Blind test: 0.59
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00964332 0.00924993 0.00782824 0.00768685 0.00750494 0.00749516
0.007514 0.00766039 0.00757623 0.00769448]
mean value: 0.007985353469848633
key: score_time
value: [0.01074767 0.00942302 0.00893474 0.0085392 0.00857997 0.00853729
0.00858474 0.00858116 0.0085609 0.00856519]
mean value: 0.008905386924743653
key: test_mcc
value: [0.44411739 0.50460839 0.2096648 0.23262105 0.40451992 0.65909298
0.47923384 0.62050523 0.39735971 0.20412415]
mean value: 0.4155847461790167
key: train_mcc
value: [0.39137259 0.44043936 0.45968386 0.4798642 0.4267072 0.43504485
0.45392287 0.42888555 0.44151079 0.46358632]
mean value: 0.4421017579498864
key: test_accuracy
value: [0.69565217 0.69565217 0.56521739 0.60869565 0.65217391 0.82608696
0.69565217 0.7826087 0.63636364 0.59090909]
mean value: 0.674901185770751
key: train_accuracy
value: [0.63414634 0.68780488 0.70243902 0.70731707 0.67804878 0.68292683
0.69756098 0.68292683 0.69417476 0.7038835 ]
mean value: 0.6871228984134502
key: test_fscore
value: [0.74074074 0.75862069 0.66666667 0.64 0.75 0.84615385
0.77419355 0.82758621 0.73333333 0.66666667]
mean value: 0.7403961698500074
key: train_fscore
value: [0.73309609 0.75384615 0.76078431 0.76744186 0.74615385 0.74903475
0.75590551 0.74708171 0.75294118 0.76078431]
mean value: 0.7527069722703967
key: test_precision
value: [0.625 0.61111111 0.52631579 0.57142857 0.6 0.78571429
0.63157895 0.70588235 0.57894737 0.5625 ]
mean value: 0.6198478426458303
key: train_precision
value: [0.57865169 0.62420382 0.63815789 0.63870968 0.61392405 0.61783439
0.63157895 0.61935484 0.63157895 0.63815789]
mean value: 0.6232152152926238
key: test_recall
value: [0.90909091 1. 0.90909091 0.72727273 1. 0.91666667
1. 1. 1. 0.81818182]
mean value: 0.928030303030303
key: train_recall
value: [1. 0.95145631 0.94174757 0.96116505 0.95098039 0.95098039
0.94117647 0.94117647 0.93203883 0.94174757]
mean value: 0.9512469065296021
key: test_roc_auc
value: [0.70454545 0.70833333 0.57954545 0.61363636 0.63636364 0.8219697
0.68181818 0.77272727 0.63636364 0.59090909]
mean value: 0.6746212121212122
key: train_roc_auc
value: [0.63235294 0.68651247 0.70126594 0.70607272 0.67937369 0.68422806
0.69874358 0.68418047 0.69417476 0.7038835 ]
mean value: 0.6870788121073672
key: test_jcc
value: [0.58823529 0.61111111 0.5 0.47058824 0.6 0.73333333
0.63157895 0.70588235 0.57894737 0.5 ]
mean value: 0.591967664258686
key: train_jcc
value: [0.57865169 0.60493827 0.61392405 0.62264151 0.59509202 0.59876543
0.60759494 0.59627329 0.60377358 0.61392405]
mean value: 0.6035578837876612
MCC on Blind test: 0.48
Accuracy on Blind test: 0.71
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00798321 0.00766301 0.00776434 0.00779152 0.0078249 0.00776362
0.0078249 0.00781822 0.00786686 0.00770688]
mean value: 0.007800745964050293
key: score_time
value: [0.00858808 0.00862932 0.00854731 0.00855184 0.00859928 0.00858569
0.00869465 0.0087173 0.0086596 0.00857878]
mean value: 0.00861518383026123
key: test_mcc
value: [ 0.3030303 0.15096491 -0.03816905 0.3030303 0.39727608 0.56818182
0.39727608 0.31252706 0.29277002 0.09245003]
mean value: 0.27793375505710965
key: train_mcc
value: [0.37046449 0.38910743 0.39476736 0.38236392 0.38354703 0.35891522
0.35302365 0.36367161 0.37290762 0.39345795]
mean value: 0.37622262820367175
key: test_accuracy
value: [0.65217391 0.56521739 0.47826087 0.65217391 0.69565217 0.7826087
0.69565217 0.65217391 0.63636364 0.54545455]
mean value: 0.6355731225296443
key: train_accuracy
value: [0.68292683 0.69268293 0.69268293 0.68780488 0.68780488 0.67804878
0.67317073 0.67804878 0.68446602 0.68932039]
mean value: 0.6846957139474308
key: test_fscore
value: [0.63636364 0.61538462 0.5 0.63636364 0.74074074 0.7826087
0.74074074 0.71428571 0.69230769 0.58333333]
mean value: 0.6642128805172284
key: train_fscore
value: [0.70852018 0.71493213 0.72489083 0.71681416 0.71428571 0.69444444
0.69955157 0.70535714 0.70588235 0.72649573]
mean value: 0.7111174245586319
key: test_precision
value: [0.63636364 0.53333333 0.46153846 0.63636364 0.66666667 0.81818182
0.66666667 0.625 0.6 0.53846154]
mean value: 0.6182575757575758
key: train_precision
value: [0.65833333 0.66949153 0.65873016 0.65853659 0.6557377 0.65789474
0.6446281 0.64754098 0.66101695 0.64885496]
mean value: 0.6560765038377927
key: test_recall
value: [0.63636364 0.72727273 0.54545455 0.63636364 0.83333333 0.75
0.83333333 0.83333333 0.81818182 0.63636364]
mean value: 0.725
key: train_recall
value: [0.76699029 0.76699029 0.80582524 0.78640777 0.78431373 0.73529412
0.76470588 0.7745098 0.75728155 0.82524272]
mean value: 0.7767561393489435
key: test_roc_auc
value: [0.65151515 0.5719697 0.48106061 0.65151515 0.68939394 0.78409091
0.68939394 0.64393939 0.63636364 0.54545455]
mean value: 0.634469696969697
key: train_roc_auc
value: [0.68251475 0.69231868 0.69212831 0.68732153 0.68827337 0.67832667
0.67361508 0.67851704 0.68446602 0.68932039]
mean value: 0.6846801827527127
key: test_jcc
value: [0.46666667 0.44444444 0.33333333 0.46666667 0.58823529 0.64285714
0.58823529 0.55555556 0.52941176 0.41176471]
mean value: 0.5027170868347339
key: train_jcc
value: [0.54861111 0.55633803 0.56849315 0.55862069 0.55555556 0.53191489
0.53793103 0.54482759 0.54545455 0.5704698 ]
mean value: 0.5518216393594725
MCC on Blind test: 0.47
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00768375 0.00718713 0.0075891 0.00747037 0.00748372 0.00737071
0.00746584 0.00747824 0.00745106 0.00744772]
mean value: 0.007462763786315918
key: score_time
value: [0.00988078 0.00973558 0.00988078 0.00987172 0.01486492 0.00991106
0.00977206 0.00987005 0.00975847 0.00983 ]
mean value: 0.010337543487548829
key: test_mcc
value: [-0.05427825 0.13740858 0.56818182 0.56490196 0.31252706 0.58930667
0.31298622 0.74242424 0.2773501 0.09245003]
mean value: 0.35432584250263544
key: train_mcc
value: [0.66217798 0.6392382 0.62934402 0.59038553 0.67133261 0.6310448
0.68889027 0.65854355 0.59234469 0.68222103]
mean value: 0.6445522695261332
key: test_accuracy
value: [0.47826087 0.56521739 0.7826087 0.7826087 0.65217391 0.7826087
0.65217391 0.86956522 0.63636364 0.54545455]
mean value: 0.674703557312253
key: train_accuracy
value: [0.82926829 0.8195122 0.81463415 0.79512195 0.83414634 0.81463415
0.84390244 0.82926829 0.7961165 0.83980583]
mean value: 0.8216410134975136
key: test_fscore
value: [0.4 0.58333333 0.7826087 0.76190476 0.71428571 0.76190476
0.63636364 0.86956522 0.6 0.5 ]
mean value: 0.6609966120835686
key: train_fscore
value: [0.83870968 0.82296651 0.81730769 0.79411765 0.82474227 0.80612245
0.83838384 0.82758621 0.79411765 0.84651163]
mean value: 0.8210565561229923
key: test_precision
value: [0.44444444 0.53846154 0.75 0.8 0.625 0.88888889
0.7 0.90909091 0.66666667 0.55555556]
mean value: 0.6878108003108003
key: train_precision
value: [0.79824561 0.81132075 0.80952381 0.8019802 0.86956522 0.84042553
0.86458333 0.83168317 0.8019802 0.8125 ]
mean value: 0.8241807825271845
key: test_recall
value: [0.36363636 0.63636364 0.81818182 0.72727273 0.83333333 0.66666667
0.58333333 0.83333333 0.54545455 0.45454545]
mean value: 0.6462121212121212
key: train_recall
value: [0.88349515 0.83495146 0.82524272 0.78640777 0.78431373 0.7745098
0.81372549 0.82352941 0.78640777 0.88349515]
mean value: 0.8196078431372549
key: test_roc_auc
value: [0.47348485 0.56818182 0.78409091 0.78030303 0.64393939 0.78787879
0.65530303 0.87121212 0.63636364 0.54545455]
mean value: 0.6746212121212121
key: train_roc_auc
value: [0.82900247 0.81943651 0.81458214 0.79516467 0.83390444 0.81443937
0.84375595 0.82924043 0.7961165 0.83980583]
mean value: 0.8215448315248429
key: test_jcc
value: [0.25 0.41176471 0.64285714 0.61538462 0.55555556 0.61538462
0.46666667 0.76923077 0.42857143 0.33333333]
mean value: 0.508874883286648
key: train_jcc
value: [0.72222222 0.69918699 0.69105691 0.65853659 0.70175439 0.67521368
0.72173913 0.70588235 0.65853659 0.73387097]
mean value: 0.6967999807689436
MCC on Blind test: 0.3
Accuracy on Blind test: 0.65
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01029658 0.01014757 0.01019192 0.01032376 0.01028013 0.01028109
0.01010323 0.01022696 0.01013231 0.01031971]
mean value: 0.010230326652526855
key: score_time
value: [0.00922775 0.00909877 0.0091269 0.00924611 0.00912356 0.00912118
0.00901008 0.00927448 0.00904131 0.00910091]
mean value: 0.009137105941772462
key: test_mcc
value: [0.56490196 0.65151515 0.38932432 0.38932432 0.42228828 0.66414149
0.65909298 0.82575758 0.64715023 0.46225016]
mean value: 0.5675746466667577
key: train_mcc
value: [0.80487341 0.72698715 0.75693529 0.76584809 0.67808871 0.71711403
0.70747264 0.72814868 0.73789886 0.74884444]
mean value: 0.7372211285541519
key: test_accuracy
value: [0.7826087 0.82608696 0.69565217 0.69565217 0.69565217 0.82608696
0.82608696 0.91304348 0.81818182 0.72727273]
mean value: 0.7806324110671937
key: train_accuracy
value: [0.90243902 0.86341463 0.87804878 0.88292683 0.83902439 0.85853659
0.85365854 0.86341463 0.86893204 0.87378641]
mean value: 0.8684181861236088
key: test_fscore
value: [0.76190476 0.81818182 0.66666667 0.66666667 0.75862069 0.81818182
0.84615385 0.91666667 0.8 0.7 ]
mean value: 0.7753042934077417
key: train_fscore
value: [0.90291262 0.8627451 0.88151659 0.88349515 0.83902439 0.85853659
0.85436893 0.86666667 0.86956522 0.87735849]
mean value: 0.8696189734979832
key: test_precision
value: [0.8 0.81818182 0.7 0.7 0.64705882 0.9
0.78571429 0.91666667 0.88888889 0.77777778]
mean value: 0.7934288260758849
key: train_precision
value: [0.90291262 0.87128713 0.86111111 0.88349515 0.83495146 0.85436893
0.84615385 0.84259259 0.86538462 0.85321101]
mean value: 0.8615468458469154
key: test_recall
value: [0.72727273 0.81818182 0.63636364 0.63636364 0.91666667 0.75
0.91666667 0.91666667 0.72727273 0.63636364]
mean value: 0.7681818181818182
key: train_recall
value: [0.90291262 0.85436893 0.90291262 0.88349515 0.84313725 0.8627451
0.8627451 0.89215686 0.87378641 0.90291262]
mean value: 0.8781172663240053
key: test_roc_auc
value: [0.78030303 0.82575758 0.69318182 0.69318182 0.68560606 0.82954545
0.8219697 0.91287879 0.81818182 0.72727273]
mean value: 0.7787878787878787
key: train_roc_auc
value: [0.9024367 0.86345898 0.8779269 0.88292404 0.83904436 0.85855702
0.85370265 0.86355416 0.86893204 0.87378641]
mean value: 0.8684323243860652
key: test_jcc
value: [0.61538462 0.69230769 0.5 0.5 0.61111111 0.69230769
0.73333333 0.84615385 0.66666667 0.53846154]
mean value: 0.6395726495726496
key: train_jcc
value: [0.82300885 0.75862069 0.78813559 0.79130435 0.72268908 0.75213675
0.74576271 0.76470588 0.76923077 0.78151261]
mean value: 0.7697107276516258
MCC on Blind test: 0.45
Accuracy on Blind test: 0.72
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.3924427 0.61908412 0.74449182 0.91650462 0.63582444 0.65240908
0.77556181 0.61309528 0.7518034 0.7737174 ]
mean value: 0.6874934673309326
key: score_time
value: [0.0111208 0.01100135 0.01523304 0.01314092 0.01097083 0.01089597
0.01093626 0.01093769 0.01482415 0.01095867]
mean value: 0.012001967430114746
key: test_mcc
value: [0.31252706 0.58930667 0.69084928 0.56818182 0.65909298 0.74047959
0.83743579 0.82575758 0.81818182 0.46225016]
mean value: 0.6504062738835646
key: train_mcc
value: [0.7606076 0.7863314 0.86610349 0.91330072 0.82498132 0.87660499
0.79068188 0.88440807 0.88499797 0.91300871]
mean value: 0.8501026166060419
key: test_accuracy
value: [0.65217391 0.7826087 0.82608696 0.7826087 0.82608696 0.86956522
0.91304348 0.91304348 0.90909091 0.72727273]
mean value: 0.8201581027667985
key: train_accuracy
value: [0.87804878 0.88780488 0.93170732 0.95609756 0.91219512 0.93658537
0.89268293 0.94146341 0.94174757 0.95631068]
mean value: 0.9234643618280843
key: test_fscore
value: [0.55555556 0.8 0.77777778 0.7826087 0.84615385 0.88
0.92307692 0.91666667 0.90909091 0.7 ]
mean value: 0.8090930373973853
key: train_fscore
value: [0.87179487 0.89686099 0.92929293 0.95522388 0.91 0.93896714
0.88541667 0.93939394 0.94339623 0.9569378 ]
mean value: 0.9227284435900899
key: test_precision
value: [0.71428571 0.71428571 1. 0.75 0.78571429 0.84615385
0.85714286 0.91666667 0.90909091 0.77777778]
mean value: 0.8271117771117771
key: train_precision
value: [0.92391304 0.83333333 0.96842105 0.97959184 0.92857143 0.9009009
0.94444444 0.96875 0.91743119 0.94339623]
mean value: 0.9308753459170286
key: test_recall
value: [0.45454545 0.90909091 0.63636364 0.81818182 0.91666667 0.91666667
1. 0.91666667 0.90909091 0.63636364]
mean value: 0.8113636363636364
key: train_recall
value: [0.82524272 0.97087379 0.89320388 0.93203883 0.89215686 0.98039216
0.83333333 0.91176471 0.97087379 0.97087379]
mean value: 0.9180753854940035
key: test_roc_auc
value: [0.64393939 0.78787879 0.81818182 0.78409091 0.8219697 0.86742424
0.90909091 0.91287879 0.90909091 0.72727273]
mean value: 0.8181818181818181
key: train_roc_auc
value: [0.87830763 0.88739768 0.93189606 0.9562155 0.91209785 0.93679802
0.89239482 0.94131925 0.94174757 0.95631068]
mean value: 0.9234485056158386
key: test_jcc
value: [0.38461538 0.66666667 0.63636364 0.64285714 0.73333333 0.78571429
0.85714286 0.84615385 0.83333333 0.53846154]
mean value: 0.6924642024642025
key: train_jcc
value: [0.77272727 0.81300813 0.86792453 0.91428571 0.83486239 0.88495575
0.79439252 0.88571429 0.89285714 0.91743119]
mean value: 0.8578158927526129
MCC on Blind test: 0.28
Accuracy on Blind test: 0.64
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01156855 0.00892901 0.00856614 0.0077455 0.00787854 0.00857687
0.00829124 0.00852704 0.00854969 0.00859594]
mean value: 0.008722853660583497
key: score_time
value: [0.01051092 0.00886154 0.00785089 0.00785041 0.008111 0.00851727
0.00845528 0.00843167 0.0078907 0.00845909]
mean value: 0.008493876457214356
key: test_mcc
value: [0.74242424 1. 0.91605722 0.66414149 0.83971912 0.74242424
0.74047959 0.74242424 0.83205029 0.73029674]
mean value: 0.7950017190609769
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86956522 1. 0.95652174 0.82608696 0.91304348 0.86956522
0.86956522 0.86956522 0.90909091 0.86363636]
mean value: 0.8946640316205533
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 1. 0.95238095 0.83333333 0.90909091 0.86956522
0.88 0.86956522 0.9 0.85714286]
mean value: 0.8940643704121964
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 1. 1. 0.76923077 1. 0.90909091
0.84615385 0.90909091 1. 0.9 ]
mean value: 0.9166899766899766
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 1. 0.90909091 0.90909091 0.83333333 0.83333333
0.91666667 0.83333333 0.81818182 0.81818182]
mean value: 0.878030303030303
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.87121212 1. 0.95454545 0.82954545 0.91666667 0.87121212
0.86742424 0.87121212 0.90909091 0.86363636]
mean value: 0.8954545454545455
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 1. 0.90909091 0.71428571 0.83333333 0.76923077
0.78571429 0.76923077 0.81818182 0.75 ]
mean value: 0.8118298368298369
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.04
Accuracy on Blind test: 0.51
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.08520198 0.08628082 0.08537316 0.08609509 0.08660245 0.08801889
0.08706832 0.08594346 0.08755255 0.08668995]
mean value: 0.08648266792297363
key: score_time
value: [0.01721334 0.01664853 0.01649141 0.0179987 0.01672387 0.01804256
0.016927 0.01826859 0.01752901 0.01691222]
mean value: 0.017275524139404298
key: test_mcc
value: [0.56490196 1. 0.83743579 0.6992059 0.69084928 0.82575758
0.83743579 0.91605722 1. 0.81818182]
mean value: 0.8189825331284214
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.7826087 1. 0.91304348 0.82608696 0.82608696 0.91304348
0.91304348 0.95652174 1. 0.90909091]
mean value: 0.9039525691699605
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76190476 1. 0.9 0.84615385 0.85714286 0.91666667
0.92307692 0.96 1. 0.90909091]
mean value: 0.9074035964035964
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 1. 1. 0.73333333 0.75 0.91666667
0.85714286 0.92307692 1. 0.90909091]
mean value: 0.8889310689310689
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.72727273 1. 0.81818182 1. 1. 0.91666667
1. 1. 1. 0.90909091]
mean value: 0.9371212121212121
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78030303 1. 0.90909091 0.83333333 0.81818182 0.91287879
0.90909091 0.95454545 1. 0.90909091]
mean value: 0.9026515151515151
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.61538462 1. 0.81818182 0.73333333 0.75 0.84615385
0.85714286 0.92307692 1. 0.83333333]
mean value: 0.8376606726606727
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.63
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0074172 0.00699353 0.00702 0.0070374 0.00713778 0.00709534
0.00710583 0.00709867 0.00728655 0.00706387]
mean value: 0.0071256160736083984
key: score_time
value: [0.00813842 0.00795412 0.00787568 0.00795722 0.00794506 0.00791264
0.0079174 0.00833249 0.00816393 0.00792503]
mean value: 0.008012199401855468
key: test_mcc
value: [0.48075018 0.39727608 0.65909298 0.48856385 0.56490196 0.82575758
0.56818182 0.56818182 0.20412415 0.54772256]
mean value: 0.5304552960001759
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.73913043 0.69565217 0.82608696 0.73913043 0.7826087 0.91304348
0.7826087 0.7826087 0.59090909 0.77272727]
mean value: 0.7624505928853755
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7 0.63157895 0.8 0.75 0.8 0.91666667
0.7826087 0.7826087 0.47058824 0.76190476]
mean value: 0.7395956002538315
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.75 0.88888889 0.69230769 0.76923077 0.91666667
0.81818182 0.81818182 0.66666667 0.8 ]
mean value: 0.7897902097902098
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.63636364 0.54545455 0.72727273 0.81818182 0.83333333 0.91666667
0.75 0.75 0.36363636 0.72727273]
mean value: 0.7068181818181818
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.73484848 0.68939394 0.8219697 0.74242424 0.78030303 0.91287879
0.78409091 0.78409091 0.59090909 0.77272727]
mean value: 0.7613636363636364
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.53846154 0.46153846 0.66666667 0.6 0.66666667 0.84615385
0.64285714 0.64285714 0.30769231 0.61538462]
mean value: 0.5988278388278389
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.24
Accuracy on Blind test: 0.62
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.15525389 1.24308133 1.07740855 1.09325242 1.08529234 1.07614017
1.08734107 1.08477783 1.07814384 1.07799172]
mean value: 1.1058683156967164
key: score_time
value: [0.09643054 0.09530497 0.09550691 0.09580946 0.09401822 0.09396243
0.09032536 0.09210563 0.08807588 0.09230471]
mean value: 0.09338440895080566
key: test_mcc
value: [0.56490196 1. 0.91605722 0.76764947 0.91666667 0.82575758
0.83743579 0.91605722 1. 0.91287093]
mean value: 0.8657396839505284
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.7826087 1. 0.95652174 0.86956522 0.95652174 0.91304348
0.91304348 0.95652174 1. 0.95454545]
mean value: 0.9302371541501976
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76190476 1. 0.95238095 0.88 0.95652174 0.91666667
0.92307692 0.96 1. 0.95652174]
mean value: 0.9307072782290173
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 1. 1. 0.78571429 1. 0.91666667
0.85714286 0.92307692 1. 0.91666667]
mean value: 0.9199267399267399
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.72727273 1. 0.90909091 1. 0.91666667 0.91666667
1. 1. 1. 1. ]
mean value: 0.946969696969697
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78030303 1. 0.95454545 0.875 0.95833333 0.91287879
0.90909091 0.95454545 1. 0.95454545]
mean value: 0.9299242424242424
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.61538462 1. 0.90909091 0.78571429 0.91666667 0.84615385
0.85714286 0.92307692 1. 0.91666667]
mean value: 0.876989676989677
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.16
Accuracy on Blind test: 0.56
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.86088943 0.88546109 0.83543372 0.87413931 0.92852831 0.89648938
0.97815275 0.92712474 0.90053868 0.94448614]
mean value: 0.9031243562698364
key: score_time
value: [0.25403523 0.22465062 0.24028587 0.24344826 0.16241813 0.23810482
0.23798776 0.16599536 0.18922591 0.23433256]
mean value: 0.21904845237731935
key: test_mcc
value: [0.48075018 0.91666667 0.82575758 0.47727273 1. 0.74242424
0.83743579 0.91605722 1. 0.64715023]
mean value: 0.7843514630610502
key: train_mcc
value: [0.90516294 0.89609853 0.89781488 0.91325992 0.91435567 0.92355447
0.91435567 0.92355447 0.89663335 0.89663335]
mean value: 0.9081423249606505
key: test_accuracy
value: [0.73913043 0.95652174 0.91304348 0.73913043 1. 0.86956522
0.91304348 0.95652174 1. 0.81818182]
mean value: 0.8905138339920948
key: train_accuracy
value: [0.95121951 0.94634146 0.94634146 0.95609756 0.95609756 0.96097561
0.95609756 0.96097561 0.94660194 0.94660194]
mean value: 0.952735022495856
key: test_fscore
value: [0.7 0.95652174 0.90909091 0.72727273 1. 0.86956522
0.92307692 0.96 1. 0.83333333]
mean value: 0.8878860849295632
key: train_fscore
value: [0.95327103 0.94883721 0.94930876 0.95734597 0.95734597 0.96190476
0.95734597 0.96190476 0.94883721 0.94883721]
mean value: 0.9544938850206197
key: test_precision
value: [0.77777778 0.91666667 0.90909091 0.72727273 1. 0.90909091
0.85714286 0.92307692 1. 0.76923077]
mean value: 0.8789349539349539
key: train_precision
value: [0.91891892 0.91071429 0.90350877 0.93518519 0.9266055 0.93518519
0.9266055 0.93518519 0.91071429 0.91071429]
mean value: 0.9213337112721468
key: test_recall
value: [0.63636364 1. 0.90909091 0.72727273 1. 0.83333333
1. 1. 1. 0.90909091]
mean value: 0.9015151515151515
key: train_recall
value: [0.99029126 0.99029126 1. 0.98058252 0.99019608 0.99019608
0.99019608 0.99019608 0.99029126 0.99029126]
mean value: 0.9902531886541024
key: test_roc_auc
value: [0.73484848 0.95833333 0.91287879 0.73863636 1. 0.87121212
0.90909091 0.95454545 1. 0.81818182]
mean value: 0.8897727272727273
key: train_roc_auc
value: [0.95102798 0.94612602 0.94607843 0.95597754 0.95626309 0.96111746
0.95626309 0.96111746 0.94660194 0.94660194]
mean value: 0.9527174947648962
key: test_jcc
value: [0.53846154 0.91666667 0.83333333 0.57142857 1. 0.76923077
0.85714286 0.92307692 1. 0.71428571]
mean value: 0.8123626373626374
key: train_jcc
value: [0.91071429 0.90265487 0.90350877 0.91818182 0.91818182 0.9266055
0.91818182 0.9266055 0.90265487 0.90265487]
mean value: 0.9129944123133789
MCC on Blind test: 0.34
Accuracy on Blind test: 0.64
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01725674 0.00764561 0.00769782 0.00771165 0.00768232 0.00783229
0.00767946 0.00749207 0.00788879 0.00778747]
mean value: 0.008667421340942384
key: score_time
value: [0.01506519 0.00864601 0.00875545 0.00867391 0.00863385 0.00854945
0.00867081 0.00864387 0.00859499 0.00872564]
mean value: 0.009295916557312012
key: test_mcc
value: [ 0.3030303 0.15096491 -0.03816905 0.3030303 0.39727608 0.56818182
0.39727608 0.31252706 0.29277002 0.09245003]
mean value: 0.27793375505710965
key: train_mcc
value: [0.37046449 0.38910743 0.39476736 0.38236392 0.38354703 0.35891522
0.35302365 0.36367161 0.37290762 0.39345795]
mean value: 0.37622262820367175
key: test_accuracy
value: [0.65217391 0.56521739 0.47826087 0.65217391 0.69565217 0.7826087
0.69565217 0.65217391 0.63636364 0.54545455]
mean value: 0.6355731225296443
key: train_accuracy
value: [0.68292683 0.69268293 0.69268293 0.68780488 0.68780488 0.67804878
0.67317073 0.67804878 0.68446602 0.68932039]
mean value: 0.6846957139474308
key: test_fscore
value: [0.63636364 0.61538462 0.5 0.63636364 0.74074074 0.7826087
0.74074074 0.71428571 0.69230769 0.58333333]
mean value: 0.6642128805172284
key: train_fscore
value: [0.70852018 0.71493213 0.72489083 0.71681416 0.71428571 0.69444444
0.69955157 0.70535714 0.70588235 0.72649573]
mean value: 0.7111174245586319
key: test_precision
value: [0.63636364 0.53333333 0.46153846 0.63636364 0.66666667 0.81818182
0.66666667 0.625 0.6 0.53846154]
mean value: 0.6182575757575758
key: train_precision
value: [0.65833333 0.66949153 0.65873016 0.65853659 0.6557377 0.65789474
0.6446281 0.64754098 0.66101695 0.64885496]
mean value: 0.6560765038377927
key: test_recall
value: [0.63636364 0.72727273 0.54545455 0.63636364 0.83333333 0.75
0.83333333 0.83333333 0.81818182 0.63636364]
mean value: 0.725
key: train_recall
value: [0.76699029 0.76699029 0.80582524 0.78640777 0.78431373 0.73529412
0.76470588 0.7745098 0.75728155 0.82524272]
mean value: 0.7767561393489435
key: test_roc_auc
value: [0.65151515 0.5719697 0.48106061 0.65151515 0.68939394 0.78409091
0.68939394 0.64393939 0.63636364 0.54545455]
mean value: 0.634469696969697
key: train_roc_auc
value: [0.68251475 0.69231868 0.69212831 0.68732153 0.68827337 0.67832667
0.67361508 0.67851704 0.68446602 0.68932039]
mean value: 0.6846801827527127
key: test_jcc
value: [0.46666667 0.44444444 0.33333333 0.46666667 0.58823529 0.64285714
0.58823529 0.55555556 0.52941176 0.41176471]
mean value: 0.5027170868347339
key: train_jcc
value: [0.54861111 0.55633803 0.56849315 0.55862069 0.55555556 0.53191489
0.53793103 0.54482759 0.54545455 0.5704698 ]
mean value: 0.5518216393594725
MCC on Blind test: 0.47
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.11559343 0.039253 0.03851128 0.17964649 0.04505944 0.04687715
0.03830266 0.04024267 0.03952813 0.04023385]
mean value: 0.062324810028076175
key: score_time
value: [0.0100162 0.01036406 0.01036692 0.01054835 0.00991821 0.00990653
0.00961637 0.00992942 0.00987625 0.01000285]
mean value: 0.010054516792297363
key: test_mcc
value: [0.65151515 1. 0.91605722 0.76764947 0.83971912 0.83971912
0.76277007 0.91605722 1. 0.81818182]
mean value: 0.8511669209848822
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.82608696 1. 0.95652174 0.86956522 0.91304348 0.91304348
0.86956522 0.95652174 1. 0.90909091]
mean value: 0.9213438735177866
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.81818182 1. 0.95238095 0.88 0.90909091 0.90909091
0.88888889 0.96 1. 0.90909091]
mean value: 0.9226724386724386
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81818182 1. 1. 0.78571429 1. 1.
0.8 0.92307692 1. 0.90909091]
mean value: 0.9236063936063936
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 1. 0.90909091 1. 0.83333333 0.83333333
1. 1. 1. 0.90909091]
mean value: 0.9303030303030303
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.82575758 1. 0.95454545 0.875 0.91666667 0.91666667
0.86363636 0.95454545 1. 0.90909091]
mean value: 0.9215909090909091
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.69230769 1. 0.90909091 0.78571429 0.83333333 0.83333333
0.8 0.92307692 1. 0.83333333]
mean value: 0.861018981018981
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.07
Accuracy on Blind test: 0.52
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01018047 0.03178215 0.03211904 0.0325737 0.03071642 0.03224421
0.03228498 0.0325532 0.03256488 0.03023434]
mean value: 0.02972533702850342
key: score_time
value: [0.01017213 0.02084589 0.02090144 0.01349664 0.02145767 0.02162623
0.01061773 0.01060104 0.01897764 0.02124476]
mean value: 0.016994118690490723
key: test_mcc
value: [0.58002308 0.65151515 0.56490196 0.83971912 0.83971912 0.91666667
0.74047959 0.82575758 0.83205029 0.73029674]
mean value: 0.7521129297642005
key: train_mcc
value: [0.87352395 0.87320324 0.86356283 0.83418999 0.88310329 0.83418999
0.84389872 0.85370265 0.83499081 0.84481947]
mean value: 0.8539184935656506
key: test_accuracy
value: [0.7826087 0.82608696 0.7826087 0.91304348 0.91304348 0.95652174
0.86956522 0.91304348 0.90909091 0.86363636]
mean value: 0.8729249011857707
key: train_accuracy
value: [0.93658537 0.93658537 0.93170732 0.91707317 0.94146341 0.91707317
0.92195122 0.92682927 0.91747573 0.9223301 ]
mean value: 0.9269074117925645
key: test_fscore
value: [0.73684211 0.81818182 0.76190476 0.91666667 0.90909091 0.95652174
0.88 0.91666667 0.9 0.86956522]
mean value: 0.8665439884295719
key: train_fscore
value: [0.93779904 0.93719807 0.93269231 0.91707317 0.94174757 0.91707317
0.92156863 0.92682927 0.9178744 0.92156863]
mean value: 0.9271424251996218
key: test_precision
value: [0.875 0.81818182 0.8 0.84615385 1. 1.
0.84615385 0.91666667 1. 0.83333333]
mean value: 0.893548951048951
key: train_precision
value: [0.9245283 0.93269231 0.92380952 0.92156863 0.93269231 0.91262136
0.92156863 0.9223301 0.91346154 0.93069307]
mean value: 0.9235965760062042
key: test_recall
value: [0.63636364 0.81818182 0.72727273 1. 0.83333333 0.91666667
0.91666667 0.91666667 0.81818182 0.90909091]
mean value: 0.8492424242424242
key: train_recall
value: [0.95145631 0.94174757 0.94174757 0.91262136 0.95098039 0.92156863
0.92156863 0.93137255 0.9223301 0.91262136]
mean value: 0.9308014467923091
key: test_roc_auc
value: [0.77651515 0.82575758 0.78030303 0.91666667 0.91666667 0.95833333
0.86742424 0.91287879 0.90909091 0.86363636]
mean value: 0.8727272727272727
key: train_roc_auc
value: [0.93651247 0.93656006 0.9316581 0.91709499 0.94150961 0.91709499
0.92194936 0.92685132 0.91747573 0.9223301 ]
mean value: 0.9269036740909956
key: test_jcc
value: [0.58333333 0.69230769 0.61538462 0.84615385 0.83333333 0.91666667
0.78571429 0.84615385 0.81818182 0.76923077]
mean value: 0.7706460206460206
key: train_jcc
value: [0.88288288 0.88181818 0.87387387 0.84684685 0.88990826 0.84684685
0.85454545 0.86363636 0.84821429 0.85454545]
mean value: 0.8643118447590925
MCC on Blind test: 0.1
Accuracy on Blind test: 0.55
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01748133 0.00712228 0.00695634 0.00682926 0.00673771 0.00671387
0.00680852 0.0069313 0.00677037 0.00677538]
mean value: 0.007912635803222656
key: score_time
value: [0.00848055 0.0081079 0.0079267 0.0077877 0.00768209 0.00775528
0.00768805 0.00777531 0.00776768 0.00765419]
mean value: 0.007862544059753418
key: test_mcc
value: [0.39727608 0.33371191 0.39393939 0.21969697 0.21452908 0.66414149
0.39727608 0.48856385 0.63636364 0.27272727]
mean value: 0.40182257604909155
key: train_mcc
value: [0.47440586 0.49337247 0.49337247 0.46430782 0.47361912 0.44415883
0.46367706 0.44784529 0.43062816 0.4882291 ]
mean value: 0.46736161990058567
key: test_accuracy
value: [0.69565217 0.65217391 0.69565217 0.60869565 0.60869565 0.82608696
0.69565217 0.73913043 0.81818182 0.63636364]
mean value: 0.6976284584980237
key: train_accuracy
value: [0.73658537 0.74634146 0.74634146 0.73170732 0.73658537 0.72195122
0.73170732 0.72195122 0.71359223 0.74271845]
mean value: 0.7329481411318968
key: test_fscore
value: [0.63157895 0.69230769 0.69565217 0.60869565 0.66666667 0.81818182
0.74074074 0.72727273 0.81818182 0.63636364]
mean value: 0.7035641873170477
key: train_fscore
value: [0.74766355 0.75471698 0.75471698 0.74178404 0.74038462 0.72463768
0.73429952 0.73732719 0.73059361 0.75576037]
mean value: 0.7421884529586577
key: test_precision
value: [0.75 0.6 0.66666667 0.58333333 0.6 0.9
0.66666667 0.8 0.81818182 0.63636364]
mean value: 0.7021212121212121
key: train_precision
value: [0.72072072 0.73394495 0.73394495 0.71818182 0.72641509 0.71428571
0.72380952 0.69565217 0.68965517 0.71929825]
mean value: 0.7175908371535152
key: test_recall
value: [0.54545455 0.81818182 0.72727273 0.63636364 0.75 0.75
0.83333333 0.66666667 0.81818182 0.63636364]
mean value: 0.7181818181818181
key: train_recall
value: [0.77669903 0.77669903 0.77669903 0.76699029 0.75490196 0.73529412
0.74509804 0.78431373 0.77669903 0.7961165 ]
mean value: 0.7689510755758614
key: test_roc_auc
value: [0.68939394 0.65909091 0.6969697 0.60984848 0.60227273 0.82954545
0.68939394 0.74242424 0.81818182 0.63636364]
mean value: 0.6973484848484848
key: train_roc_auc
value: [0.73638873 0.74619265 0.74619265 0.73153436 0.73667428 0.72201599
0.73177232 0.72225395 0.71359223 0.74271845]
mean value: 0.7329335617742243
key: test_jcc
value: [0.46153846 0.52941176 0.53333333 0.4375 0.5 0.69230769
0.58823529 0.57142857 0.69230769 0.46666667]
mean value: 0.5472729476405946
key: train_jcc
value: [0.59701493 0.60606061 0.60606061 0.58955224 0.58778626 0.56818182
0.58015267 0.58394161 0.57553957 0.60740741]
mean value: 0.5901697707371992
MCC on Blind test: 0.43
Accuracy on Blind test: 0.71
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0074048 0.00984526 0.01072073 0.00961113 0.01026011 0.01042223
0.01001883 0.01076102 0.01011777 0.01003718]
mean value: 0.00991990566253662
key: score_time
value: [0.00778842 0.00981927 0.00983167 0.01007533 0.01036739 0.01042461
0.01043558 0.01044655 0.01036429 0.01048994]
mean value: 0.010004305839538574
key: test_mcc
value: [0.56490196 0.66414149 0.65151515 0.63327851 0.91666667 0.74047959
0.83743579 0.91605722 0.91287093 0.54772256]
mean value: 0.7385069858830032
key: train_mcc
value: [0.8345235 0.8345235 0.86600321 0.61725542 0.82136935 0.84332727
0.82455974 0.85570033 0.78655606 0.79179983]
mean value: 0.8075618225770705
key: test_accuracy
value: [0.7826087 0.82608696 0.82608696 0.7826087 0.95652174 0.86956522
0.91304348 0.95652174 0.95454545 0.77272727]
mean value: 0.8640316205533597
key: train_accuracy
value: [0.91707317 0.91707317 0.93170732 0.7804878 0.90731707 0.91707317
0.91219512 0.92682927 0.89320388 0.89320388]
mean value: 0.8996163864551268
key: test_fscore
value: [0.76190476 0.83333333 0.81818182 0.81481481 0.95652174 0.88
0.92307692 0.96 0.95652174 0.76190476]
mean value: 0.8666259891477283
key: train_fscore
value: [0.91625616 0.91625616 0.93457944 0.81927711 0.9124424 0.92237443
0.91262136 0.92890995 0.89215686 0.88659794]
mean value: 0.904147180121348
key: test_precision
value: [0.8 0.76923077 0.81818182 0.6875 1. 0.84615385
0.85714286 0.92307692 0.91666667 0.8 ]
mean value: 0.841795288045288
key: train_precision
value: [0.93 0.93 0.9009009 0.69863014 0.86086957 0.86324786
0.90384615 0.89908257 0.9009901 0.94505495]
mean value: 0.8832622233070796
key: test_recall
value: [0.72727273 0.90909091 0.81818182 1. 0.91666667 0.91666667
1. 1. 1. 0.72727273]
mean value: 0.9015151515151515
key: train_recall
value: [0.90291262 0.90291262 0.97087379 0.99029126 0.97058824 0.99019608
0.92156863 0.96078431 0.88349515 0.83495146]
mean value: 0.9328574148105845
key: test_roc_auc
value: [0.78030303 0.82954545 0.82575758 0.79166667 0.95833333 0.86742424
0.90909091 0.95454545 0.95454545 0.77272727]
mean value: 0.8643939393939394
key: train_roc_auc
value: [0.91714259 0.91714259 0.93151532 0.77945936 0.90762421 0.91742814
0.91224062 0.9269941 0.89320388 0.89320388]
mean value: 0.8995954692556635
key: test_jcc
value: [0.61538462 0.71428571 0.69230769 0.6875 0.91666667 0.78571429
0.85714286 0.92307692 0.91666667 0.61538462]
mean value: 0.7724130036630037
key: train_jcc
value: [0.84545455 0.84545455 0.87719298 0.69387755 0.83898305 0.8559322
0.83928571 0.86725664 0.80530973 0.7962963 ]
mean value: 0.8265043260886354
MCC on Blind test: 0.2
Accuracy on Blind test: 0.57
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01033568 0.01053357 0.01022339 0.01038384 0.01062822 0.01051664
0.01035213 0.01098609 0.01057267 0.01157546]
mean value: 0.01061077117919922
key: score_time
value: [0.01053452 0.01041985 0.01034975 0.01045704 0.01039529 0.01041269
0.01073003 0.01061511 0.0108068 0.0110898 ]
mean value: 0.010581088066101075
key: test_mcc
value: [0.33946383 0.39727608 0.65909298 0.76764947 0.83971912 0.76277007
0.76277007 0.82575758 0.83205029 0.63636364]
mean value: 0.6822913134388915
key: train_mcc
value: [0.74004127 0.85570033 0.72342586 0.7674294 0.80545006 0.55024014
0.7696264 0.91224062 0.81572728 0.85473156]
mean value: 0.7794612920006722
key: test_accuracy
value: [0.65217391 0.69565217 0.82608696 0.86956522 0.91304348 0.86956522
0.86956522 0.91304348 0.90909091 0.81818182]
mean value: 0.833596837944664
key: train_accuracy
value: [0.85365854 0.92682927 0.84878049 0.87317073 0.90243902 0.73170732
0.87804878 0.95609756 0.90291262 0.92718447]
mean value: 0.8800828794695714
key: test_fscore
value: [0.5 0.63157895 0.8 0.88 0.90909091 0.88888889
0.88888889 0.91666667 0.91666667 0.81818182]
mean value: 0.8149962785752259
key: train_fscore
value: [0.82954545 0.92462312 0.82681564 0.88695652 0.9 0.78764479
0.88789238 0.95609756 0.90990991 0.92610837]
mean value: 0.8835593743916733
key: test_precision
value: [0.8 0.75 0.88888889 0.78571429 1. 0.8
0.8 0.91666667 0.84615385 0.81818182]
mean value: 0.8405605505605506
key: train_precision
value: [1. 0.95833333 0.97368421 0.80314961 0.91836735 0.64968153
0.81818182 0.95145631 0.8487395 0.94 ]
mean value: 0.8861593650419807
key: test_recall
value: [0.36363636 0.54545455 0.72727273 1. 0.83333333 1.
1. 0.91666667 1. 0.81818182]
mean value: 0.8204545454545454
key: train_recall
value: [0.70873786 0.89320388 0.7184466 0.99029126 0.88235294 1.
0.97058824 0.96078431 0.98058252 0.91262136]
mean value: 0.901760898534171
key: test_roc_auc
value: [0.64015152 0.68939394 0.8219697 0.875 0.91666667 0.86363636
0.86363636 0.91287879 0.90909091 0.81818182]
mean value: 0.831060606060606
key: train_roc_auc
value: [0.85436893 0.9269941 0.84941938 0.87259661 0.90234152 0.73300971
0.878498 0.95612031 0.90291262 0.92718447]
mean value: 0.8803445650104702
key: test_jcc
value: [0.33333333 0.46153846 0.66666667 0.78571429 0.83333333 0.8
0.8 0.84615385 0.84615385 0.69230769]
mean value: 0.7065201465201465
key: train_jcc
value: [0.70873786 0.85981308 0.7047619 0.796875 0.81818182 0.64968153
0.7983871 0.91588785 0.83471074 0.86238532]
mean value: 0.7949422211940016
MCC on Blind test: 0.11
Accuracy on Blind test: 0.54
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.08366251 0.06921864 0.0717814 0.07228327 0.07036138 0.0708344
0.07182002 0.0701704 0.07077861 0.07223129]
mean value: 0.07231419086456299
key: score_time
value: [0.01484537 0.01417661 0.01449132 0.01528001 0.01412058 0.01532507
0.01464701 0.01492405 0.01425171 0.01426148]
mean value: 0.014632320404052735
key: test_mcc
value: [0.83971912 0.91605722 0.91605722 0.58930667 0.83971912 0.91666667
0.76277007 0.82575758 0.91287093 0.81818182]
mean value: 0.833710642294015
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91304348 0.95652174 0.95652174 0.7826087 0.91304348 0.95652174
0.86956522 0.91304348 0.95454545 0.90909091]
mean value: 0.9124505928853754
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91666667 0.95238095 0.95238095 0.8 0.90909091 0.95652174
0.88888889 0.91666667 0.95238095 0.90909091]
mean value: 0.9154068636677333
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.84615385 1. 1. 0.71428571 1. 1.
0.8 0.91666667 1. 0.90909091]
mean value: 0.9186197136197136
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.90909091 0.90909091 0.90909091 0.83333333 0.91666667
1. 0.91666667 0.90909091 0.90909091]
mean value: 0.9212121212121211
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91666667 0.95454545 0.95454545 0.78787879 0.91666667 0.95833333
0.86363636 0.91287879 0.95454545 0.90909091]
mean value: 0.9128787878787878
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84615385 0.90909091 0.90909091 0.66666667 0.83333333 0.91666667
0.8 0.84615385 0.90909091 0.83333333]
mean value: 0.8469580419580419
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.52
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03033328 0.03085208 0.03747296 0.03314781 0.02555704 0.03062606
0.02538133 0.02366614 0.04046106 0.03656936]
mean value: 0.031406712532043454
key: score_time
value: [0.02738023 0.02815413 0.01663828 0.01588154 0.01602888 0.01523161
0.02233171 0.02261209 0.01840067 0.01769853]
mean value: 0.020035767555236818
key: test_mcc
value: [0.83971912 0.83743579 0.76277007 0.66414149 0.91666667 0.82575758
0.83743579 0.91605722 1. 0.91287093]
mean value: 0.851285465730236
key: train_mcc
value: [0.99029126 0.98067587 0.99029126 1. 1. 1.
1. 0.99029126 0.99033794 0.99033794]
mean value: 0.9932225534805602
key: test_accuracy
value: [0.91304348 0.91304348 0.86956522 0.82608696 0.95652174 0.91304348
0.91304348 0.95652174 1. 0.95454545]
mean value: 0.9215415019762846
key: train_accuracy
value: [0.99512195 0.9902439 0.99512195 1. 1. 1.
1. 0.99512195 0.99514563 0.99514563]
mean value: 0.9965901018233483
key: test_fscore
value: [0.91666667 0.9 0.84210526 0.83333333 0.95652174 0.91666667
0.92307692 0.96 1. 0.95652174]
mean value: 0.9204892331162354
key: train_fscore
value: [0.99512195 0.99019608 0.99512195 1. 1. 1.
1. 0.99512195 0.99512195 0.99512195]
mean value: 0.9965805834528934
key: test_precision
value: [0.84615385 1. 1. 0.76923077 1. 0.91666667
0.85714286 0.92307692 1. 0.91666667]
mean value: 0.922893772893773
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.99029126 1. 1. ]
mean value: 0.9990291262135922
key: test_recall
value: [1. 0.81818182 0.72727273 0.90909091 0.91666667 0.91666667
1. 1. 1. 1. ]
mean value: 0.9287878787878788
key: train_recall
value: [0.99029126 0.98058252 0.99029126 1. 1. 1.
1. 1. 0.99029126 0.99029126]
mean value: 0.9941747572815534
key: test_roc_auc
value: [0.91666667 0.90909091 0.86363636 0.82954545 0.95833333 0.91287879
0.90909091 0.95454545 1. 0.95454545]
mean value: 0.9208333333333333
key: train_roc_auc
value: [0.99514563 0.99029126 0.99514563 1. 1. 1.
1. 0.99514563 0.99514563 0.99514563]
mean value: 0.9966019417475728
key: test_jcc
value: [0.84615385 0.81818182 0.72727273 0.71428571 0.91666667 0.84615385
0.85714286 0.92307692 1. 0.91666667]
mean value: 0.8565601065601065
key: train_jcc
value: [0.99029126 0.98058252 0.99029126 1. 1. 1.
1. 0.99029126 0.99029126 0.99029126]
mean value: 0.9932038834951457
MCC on Blind test: 0.13
Accuracy on Blind test: 0.55
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.04991055 0.09063339 0.06661296 0.04403114 0.03577352 0.03729105
0.02286935 0.02291799 0.02340531 0.04758835]
mean value: 0.044103360176086424
key: score_time
value: [0.02371454 0.02476144 0.03154182 0.01158547 0.02154064 0.01150608
0.01171994 0.01175761 0.01140141 0.01663899]
mean value: 0.017616796493530273
key: test_mcc
value: [0.39727608 0.56818182 0.56490196 0.38932432 0.31252706 0.6992059
0.65151515 0.74242424 0.64715023 0.61237244]
mean value: 0.558487918534798
key: train_mcc
value: [0.93174679 0.94146202 0.9024367 0.92194936 0.91224062 0.89272796
0.9024367 0.94146202 0.91266437 0.93243443]
mean value: 0.9191560993974353
key: test_accuracy
value: [0.69565217 0.7826087 0.7826087 0.69565217 0.65217391 0.82608696
0.82608696 0.86956522 0.81818182 0.77272727]
mean value: 0.7721343873517786
key: train_accuracy
value: [0.96585366 0.97073171 0.95121951 0.96097561 0.95609756 0.94634146
0.95121951 0.97073171 0.95631068 0.96601942]
mean value: 0.9595500828794695
key: test_fscore
value: [0.63157895 0.7826087 0.76190476 0.66666667 0.71428571 0.8
0.83333333 0.86956522 0.8 0.70588235]
mean value: 0.7565825689543552
key: train_fscore
value: [0.96618357 0.97087379 0.95145631 0.96116505 0.95609756 0.94634146
0.95098039 0.97058824 0.95652174 0.96650718]
mean value: 0.9596715288515447
key: test_precision
value: [0.75 0.75 0.8 0.7 0.625 1.
0.83333333 0.90909091 0.88888889 1. ]
mean value: 0.8256313131313131
key: train_precision
value: [0.96153846 0.97087379 0.95145631 0.96116505 0.95145631 0.94174757
0.95098039 0.97058824 0.95192308 0.95283019]
mean value: 0.9564559383717978
key: test_recall
value: [0.54545455 0.81818182 0.72727273 0.63636364 0.83333333 0.66666667
0.83333333 0.83333333 0.72727273 0.54545455]
mean value: 0.7166666666666667
key: train_recall
value: [0.97087379 0.97087379 0.95145631 0.96116505 0.96078431 0.95098039
0.95098039 0.97058824 0.96116505 0.98058252]
mean value: 0.9629449838187703
key: test_roc_auc
value: [0.68939394 0.78409091 0.78030303 0.69318182 0.64393939 0.83333333
0.82575758 0.87121212 0.81818182 0.77272727]
mean value: 0.7712121212121212
key: train_roc_auc
value: [0.96582905 0.97073101 0.95121835 0.96097468 0.95612031 0.94636398
0.95121835 0.97073101 0.95631068 0.96601942]
mean value: 0.9595516847515705
key: test_jcc
value: [0.46153846 0.64285714 0.61538462 0.5 0.55555556 0.66666667
0.71428571 0.76923077 0.66666667 0.54545455]
mean value: 0.6137640137640138
key: train_jcc
value: [0.93457944 0.94339623 0.90740741 0.92523364 0.91588785 0.89814815
0.90654206 0.94285714 0.91666667 0.93518519]
mean value: 0.9225903767333851
MCC on Blind test: 0.34
Accuracy on Blind test: 0.67
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.1293633 0.12412262 0.12290406 0.12451863 0.12319613 0.12431335
0.12317395 0.12272644 0.12358046 0.12428999]
mean value: 0.12421889305114746
key: score_time
value: [0.00877905 0.00823951 0.00833726 0.00852513 0.00845313 0.00828242
0.00824165 0.0083313 0.00858903 0.00842285]
mean value: 0.008420133590698242
key: test_mcc
value: [0.74242424 0.91605722 0.91605722 0.76764947 0.83971912 0.91605722
0.83743579 0.91605722 1. 0.81818182]
mean value: 0.866963934561781
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86956522 0.95652174 0.95652174 0.86956522 0.91304348 0.95652174
0.91304348 0.95652174 1. 0.90909091]
mean value: 0.9300395256916996
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 0.95238095 0.95238095 0.88 0.90909091 0.96
0.92307692 0.96 1. 0.90909091]
mean value: 0.931558586341195
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 1. 1. 0.78571429 1. 0.92307692
0.85714286 0.92307692 1. 0.90909091]
mean value: 0.9231435231435231
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.90909091 0.90909091 1. 0.83333333 1.
1. 1. 1. 0.90909091]
mean value: 0.946969696969697
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.87121212 0.95454545 0.95454545 0.875 0.91666667 0.95454545
0.90909091 0.95454545 1. 0.90909091]
mean value: 0.9299242424242424
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 0.90909091 0.90909091 0.78571429 0.83333333 0.92307692
0.85714286 0.92307692 1. 0.83333333]
mean value: 0.8743090243090244
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.14
Accuracy on Blind test: 0.55
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00908375 0.01190424 0.01377177 0.01136947 0.01182413 0.0117712
0.01173353 0.01190329 0.0147233 0.01190543]
mean value: 0.011999011039733887
key: score_time
value: [0.01050138 0.01059628 0.01065016 0.01060724 0.01079369 0.01067948
0.01063824 0.01067424 0.01086879 0.0129354 ]
mean value: 0.010894489288330079
key: test_mcc
value: [0.47727273 0.66414149 0.17236256 0.37057951 0.55048188 0.56490196
0.40451992 0.40451992 0.13245324 0.48795004]
mean value: 0.42291832347791636
key: train_mcc
value: [0.5185658 0.60463182 0.61253896 0.61919584 0.62634721 0.57825573
0.42798979 0.54305523 0.59064979 0.61850654]
mean value: 0.5739736713126434
key: test_accuracy
value: [0.73913043 0.82608696 0.56521739 0.65217391 0.73913043 0.7826087
0.65217391 0.65217391 0.54545455 0.72727273]
mean value: 0.6881422924901186
key: train_accuracy
value: [0.72682927 0.8 0.7902439 0.78536585 0.78536585 0.76585366
0.65365854 0.73658537 0.77669903 0.77669903]
mean value: 0.7597300497276818
key: test_fscore
value: [0.72727273 0.83333333 0.64285714 0.71428571 0.8 0.8
0.75 0.75 0.66666667 0.76923077]
mean value: 0.7453646353646354
key: train_fscore
value: [0.78125 0.81278539 0.82008368 0.82113821 0.82113821 0.80327869
0.74181818 0.78740157 0.80991736 0.81746032]
mean value: 0.8016271610878589
key: test_precision
value: [0.72727273 0.76923077 0.52941176 0.58823529 0.66666667 0.76923077
0.6 0.6 0.52631579 0.66666667]
mean value: 0.6443030447364813
key: train_precision
value: [0.65359477 0.76724138 0.72058824 0.70629371 0.70138889 0.69014085
0.58959538 0.65789474 0.70503597 0.69127517]
mean value: 0.6883049077672215
key: test_recall
value: [0.72727273 0.90909091 0.81818182 0.90909091 1. 0.83333333
1. 1. 0.90909091 0.90909091]
mean value: 0.9015151515151515
key: train_recall
value: [0.97087379 0.86407767 0.95145631 0.98058252 0.99019608 0.96078431
1. 0.98039216 0.95145631 1. ]
mean value: 0.9649819150961355
key: test_roc_auc
value: [0.73863636 0.82954545 0.57575758 0.66287879 0.72727273 0.78030303
0.63636364 0.63636364 0.54545455 0.72727273]
mean value: 0.6859848484848485
key: train_roc_auc
value: [0.72563297 0.79968589 0.78945365 0.78440891 0.78636018 0.76679992
0.65533981 0.73776889 0.77669903 0.77669903]
mean value: 0.7598848277174948
key: test_jcc
value: [0.57142857 0.71428571 0.47368421 0.55555556 0.66666667 0.66666667
0.6 0.6 0.5 0.625 ]
mean value: 0.597328738512949
key: train_jcc
value: [0.64102564 0.68461538 0.69503546 0.69655172 0.69655172 0.67123288
0.58959538 0.64935065 0.68055556 0.69127517]
mean value: 0.6695789560036107
MCC on Blind test: 0.35
Accuracy on Blind test: 0.62
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01302195 0.01023006 0.01021552 0.01021671 0.01030731 0.01027513
0.01022935 0.01022243 0.01024985 0.01030946]
mean value: 0.010527777671813964
key: score_time
value: [0.01048541 0.0103786 0.01038289 0.01038122 0.01036906 0.01034164
0.01037383 0.01037812 0.01037979 0.01041484]
mean value: 0.010388541221618652
key: test_mcc
value: [0.58002308 0.74242424 0.65909298 0.74242424 0.83971912 0.91666667
0.83743579 0.82575758 0.91287093 0.63636364]
mean value: 0.7692778262419881
key: train_mcc
value: [0.85368872 0.84407425 0.86341138 0.84407425 0.82438607 0.81495251
0.84389872 0.81495251 0.83499081 0.85473156]
mean value: 0.8393160802573247
key: test_accuracy
value: [0.7826087 0.86956522 0.82608696 0.86956522 0.91304348 0.95652174
0.91304348 0.91304348 0.95454545 0.81818182]
mean value: 0.8816205533596838
key: train_accuracy
value: [0.92682927 0.92195122 0.93170732 0.92195122 0.91219512 0.90731707
0.92195122 0.90731707 0.91747573 0.92718447]
mean value: 0.9195879706369879
key: test_fscore
value: [0.73684211 0.86956522 0.8 0.86956522 0.90909091 0.95652174
0.92307692 0.91666667 0.95238095 0.81818182]
mean value: 0.875189154857347
key: train_fscore
value: [0.92753623 0.92156863 0.93203883 0.92156863 0.91176471 0.90547264
0.92156863 0.90547264 0.9178744 0.92610837]
mean value: 0.9190973699222151
key: test_precision
value: [0.875 0.83333333 0.88888889 0.83333333 1. 1.
0.85714286 0.91666667 1. 0.81818182]
mean value: 0.9022546897546897
key: train_precision
value: [0.92307692 0.93069307 0.93203883 0.93069307 0.91176471 0.91919192
0.92156863 0.91919192 0.91346154 0.94 ]
mean value: 0.9241680606820951
key: test_recall
value: [0.63636364 0.90909091 0.72727273 0.90909091 0.83333333 0.91666667
1. 0.91666667 0.90909091 0.81818182]
mean value: 0.8575757575757575
key: train_recall
value: [0.93203883 0.91262136 0.93203883 0.91262136 0.91176471 0.89215686
0.92156863 0.89215686 0.9223301 0.91262136]
mean value: 0.9141918903483723
key: test_roc_auc
value: [0.77651515 0.87121212 0.8219697 0.87121212 0.91666667 0.95833333
0.90909091 0.91287879 0.95454545 0.81818182]
mean value: 0.8810606060606061
key: train_roc_auc
value: [0.92680373 0.92199695 0.93170569 0.92199695 0.91219303 0.90724348
0.92194936 0.90724348 0.91747573 0.92718447]
mean value: 0.91957928802589
key: test_jcc
value: [0.58333333 0.76923077 0.66666667 0.76923077 0.83333333 0.91666667
0.85714286 0.84615385 0.90909091 0.69230769]
mean value: 0.7843156843156843
key: train_jcc
value: [0.86486486 0.85454545 0.87272727 0.85454545 0.83783784 0.82727273
0.85454545 0.82727273 0.84821429 0.86238532]
mean value: 0.8504211400426996
MCC on Blind test: 0.19
Accuracy on Blind test: 0.59
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm',
'electro_ss', 'disulfide_rr', 'disulfide_mm', 'd...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.08454299 0.08204246 0.08172989 0.0863173 0.09503531 0.08147788
0.08158469 0.08152223 0.09179831 0.08183026]
mean value: 0.08478813171386719
key: score_time
value: [0.01065302 0.01066089 0.01067209 0.01064348 0.01065207 0.010607
0.010638 0.01059103 0.01064205 0.01067996]
mean value: 0.010643959045410156
key: test_mcc
value: [0.58002308 0.65151515 0.65909298 0.74242424 0.83971912 0.91666667
0.83743579 0.82575758 0.91287093 0.73029674]
mean value: 0.7695802278487375
key: train_mcc
value: [0.85368872 0.87320324 0.86356283 0.84407425 0.87321531 0.83417421
0.84389872 0.85370265 0.83499081 0.86407767]
mean value: 0.8538588407839809
key: test_accuracy
value: [0.7826087 0.82608696 0.82608696 0.86956522 0.91304348 0.95652174
0.91304348 0.91304348 0.95454545 0.86363636]
mean value: 0.8818181818181818
key: train_accuracy
value: [0.92682927 0.93658537 0.93170732 0.92195122 0.93658537 0.91707317
0.92195122 0.92682927 0.91747573 0.93203883]
mean value: 0.9269026758228748
key: test_fscore
value: [0.73684211 0.81818182 0.8 0.86956522 0.90909091 0.95652174
0.92307692 0.91666667 0.95238095 0.86956522]
mean value: 0.875189154857347
key: train_fscore
value: [0.92753623 0.93719807 0.93269231 0.92156863 0.93658537 0.91625616
0.92156863 0.92682927 0.9178744 0.93203883]
mean value: 0.9270147884979708
key: test_precision
value: [0.875 0.81818182 0.88888889 0.83333333 1. 1.
0.85714286 0.91666667 1. 0.83333333]
mean value: 0.9022546897546897
key: train_precision
value: [0.92307692 0.93269231 0.92380952 0.93069307 0.93203883 0.92079208
0.92156863 0.9223301 0.91346154 0.93203883]
mean value: 0.9252501835996416
key: test_recall
value: [0.63636364 0.81818182 0.72727273 0.90909091 0.83333333 0.91666667
1. 0.91666667 0.90909091 0.90909091]
mean value: 0.8575757575757575
key: train_recall
value: [0.93203883 0.94174757 0.94174757 0.91262136 0.94117647 0.91176471
0.92156863 0.93137255 0.9223301 0.93203883]
mean value: 0.9288406624785837
key: test_roc_auc
value: [0.77651515 0.82575758 0.8219697 0.87121212 0.91666667 0.95833333
0.90909091 0.91287879 0.95454545 0.86363636]
mean value: 0.8810606060606061
key: train_roc_auc
value: [0.92680373 0.93656006 0.9316581 0.92199695 0.93660765 0.9170474
0.92194936 0.92685132 0.91747573 0.93203883]
mean value: 0.9268989149057681
key: test_jcc
value: [0.58333333 0.69230769 0.66666667 0.76923077 0.83333333 0.91666667
0.85714286 0.84615385 0.90909091 0.76923077]
mean value: 0.7843156843156843
key: train_jcc
value: [0.86486486 0.88181818 0.87387387 0.85454545 0.88073394 0.84545455
0.85454545 0.86363636 0.84821429 0.87272727]
mean value: 0.8640414242134425
MCC on Blind test: 0.12
Accuracy on Blind test: 0.56