LSHTM_analysis/scripts/ml/log_rpob_cd_7030.txt
2022-06-20 21:55:47 +01:00

19846 lines
985 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_7030.py:548: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 1133
PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation
or_mychisq 339
log10_or_mychisq 339
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 169
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 176
-------------------------------------------------------------
Successfully split data with stratification [COMPLETE data]: 70/30
Original data size: (1132, 176)
Train data size: (758, 176)
Test data size: (374, 176)
y_train numbers: Counter({0: 554, 1: 204})
y_train ratio: 2.715686274509804
y_test_numbers: Counter({0: 273, 1: 101})
y_test ratio: 2.702970297029703
-------------------------------------------------------------
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
index: 2
ind: 3
Mask count check: True
Original Data
Counter({0: 554, 1: 204}) Data dim: (758, 176)
Simple Random OverSampling
Counter({0: 554, 1: 554})
(1108, 176)
Simple Random UnderSampling
Counter({0: 204, 1: 204})
(408, 176)
Simple Combined Over and UnderSampling
Counter({0: 554, 1: 554})
(1108, 176)
SMOTE_NC OverSampling
Counter({0: 554, 1: 554})
(1108, 176)
#####################################################################
Running ML analysis [COMPLETE DATA]: 70/30 split
Gene name: rpoB
Drug name: rifampicin
Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_cd_7030/
Sanity checks:
Total input features: 176
Training data size: (758, 176)
Test data size: (374, 176)
Target feature numbers (training data): Counter({0: 554, 1: 204})
Target features ratio (training data: 2.715686274509804
Target feature numbers (test data): Counter({0: 273, 1: 101})
Target features ratio (test data): 2.702970297029703
#####################################################################
================================================================
Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04495931 0.05544949 0.04286551 0.04531431 0.03928089 0.03883028
0.04265618 0.03910923 0.03948498 0.03936028]
mean value: 0.04273104667663574
key: score_time
value: [0.0126586 0.01351953 0.01310349 0.01323962 0.01216793 0.01326966
0.01327348 0.01291776 0.013165 0.01312828]
mean value: 0.013044333457946778
key: test_mcc
value: [0.48304589 0.63304195 0.52205834 0.60684923 0.64368314 0.28501393
0.52663543 0.56622086 0.68863504 0.56917479]
mean value: 0.5524358605438721
key: train_mcc
value: [0.71022006 0.6979828 0.6967624 0.70601791 0.69141924 0.7044883
0.72431165 0.72934313 0.70196615 0.68425989]
mean value: 0.7046771545445295
key: test_accuracy
value: [0.80263158 0.85526316 0.82894737 0.84210526 0.85526316 0.73684211
0.81578947 0.82894737 0.88 0.84 ]
mean value: 0.8285789473684211
key: train_accuracy
value: [0.88709677 0.88416422 0.88269795 0.8856305 0.88123167 0.88416422
0.89296188 0.89442815 0.88433382 0.87847731]
mean value: 0.8855186493948124
key: test_fscore
value: [0.61538462 0.73170732 0.60606061 0.71428571 0.74418605 0.44444444
0.65 0.68292683 0.76923077 0.66666667]
mean value: 0.6624893008925907
key: train_fscore
value: [0.7867036 0.77363897 0.7752809 0.78333333 0.77053824 0.78356164
0.79665738 0.80110497 0.77994429 0.76487252]
mean value: 0.7815635854192167
key: test_precision
value: [0.63157895 0.71428571 0.76923077 0.68181818 0.72727273 0.53333333
0.68421053 0.7 0.78947368 0.75 ]
mean value: 0.6981203883835463
key: train_precision
value: [0.80225989 0.81818182 0.80232558 0.80113636 0.8 0.78571429
0.8125 0.81005587 0.8 0.79881657]
mean value: 0.8030990369902591
key: test_recall
value: [0.6 0.75 0.5 0.75 0.76190476 0.38095238
0.61904762 0.66666667 0.75 0.6 ]
mean value: 0.6378571428571429
key: train_recall
value: [0.77173913 0.73369565 0.75 0.76630435 0.7431694 0.78142077
0.78142077 0.79234973 0.76086957 0.73369565]
mean value: 0.7614665003563792
key: test_roc_auc
value: [0.7375 0.82142857 0.72321429 0.8125 0.82640693 0.62683983
0.75497835 0.77878788 0.83863636 0.76363636]
mean value: 0.7683928571428571
key: train_roc_auc
value: [0.850729 0.83672734 0.84086345 0.84801161 0.83751656 0.85163223
0.85764425 0.86210673 0.84536464 0.83277969]
mean value: 0.8463375511496104
key: test_jcc
value: [0.44444444 0.57692308 0.43478261 0.55555556 0.59259259 0.28571429
0.48148148 0.51851852 0.625 0.5 ]
mean value: 0.5015012563925607
key: train_jcc
value: [0.64840183 0.63084112 0.63302752 0.64383562 0.62672811 0.64414414
0.66203704 0.66820276 0.63926941 0.61926606]
mean value: 0.6415753605549265
MCC on Blind test: 0.59
Accuracy on Blind test: 0.84
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.11514831 1.08928394 1.08557916 0.98781013 1.01957607 0.89390969
0.96477437 1.0153501 0.90592122 1.03056788]
mean value: 1.010792088508606
key: score_time
value: [0.01500988 0.01334858 0.01334858 0.01227927 0.01526356 0.01327825
0.01342034 0.01259232 0.0133462 0.01332021]
mean value: 0.01352071762084961
key: test_mcc
value: [0.51048128 0.621059 0.79080467 0.71205323 0.64368314 0.35894245
0.52663543 0.53939394 0.6983799 0.59090909]
mean value: 0.5992342136550841
key: train_mcc
value: [0.83684028 0.77396514 0.82137245 0.76171102 0.77379327 0.76434019
0.76695008 0.77518908 0.78917499 0.81581794]
mean value: 0.7879154432429085
key: test_accuracy
value: [0.81578947 0.85526316 0.92105263 0.88157895 0.85526316 0.76315789
0.81578947 0.81578947 0.88 0.84 ]
mean value: 0.8443684210526315
key: train_accuracy
value: [0.93548387 0.91202346 0.92961877 0.90762463 0.91202346 0.90762463
0.90909091 0.91202346 0.91800878 0.92825769]
mean value: 0.9171779667930425
key: test_fscore
value: [0.63157895 0.71794872 0.83333333 0.79069767 0.74418605 0.5
0.65 0.66666667 0.7804878 0.7 ]
mean value: 0.701489919112542
key: train_fscore
value: [0.88108108 0.83333333 0.86956522 0.82352941 0.83333333 0.82739726
0.82872928 0.83516484 0.84444444 0.86426593]
mean value: 0.8440844126532805
key: test_precision
value: [0.66666667 0.73684211 0.9375 0.73913043 0.72727273 0.6
0.68421053 0.66666667 0.76190476 0.7 ]
mean value: 0.7220193888872378
key: train_precision
value: [0.87634409 0.85227273 0.86956522 0.84971098 0.84745763 0.82967033
0.83798883 0.83977901 0.86363636 0.88135593]
mean value: 0.8547781098313728
key: test_recall
value: [0.6 0.7 0.75 0.85 0.76190476 0.42857143
0.61904762 0.66666667 0.8 0.7 ]
mean value: 0.6876190476190476
key: train_recall
value: [0.88586957 0.81521739 0.86956522 0.79891304 0.81967213 0.82513661
0.81967213 0.83060109 0.82608696 0.84782609]
mean value: 0.833856022808268
key: test_roc_auc
value: [0.74642857 0.80535714 0.86607143 0.87142857 0.82640693 0.65974026
0.75497835 0.76969697 0.85454545 0.79545455]
mean value: 0.7950108225108226
key: train_roc_auc
value: [0.91984241 0.88150428 0.91068622 0.8733521 0.88278196 0.88150618
0.88077795 0.88624243 0.88899538 0.90287096]
mean value: 0.8908559878389313
key: test_jcc
value: [0.46153846 0.56 0.71428571 0.65384615 0.59259259 0.33333333
0.48148148 0.5 0.64 0.53846154]
mean value: 0.5475539275539275
key: train_jcc
value: [0.78743961 0.71428571 0.76923077 0.7 0.71428571 0.70560748
0.70754717 0.71698113 0.73076923 0.76097561]
mean value: 0.7307122430376403
MCC on Blind test: 0.63
Accuracy on Blind test: 0.86
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01584315 0.01114631 0.01075149 0.01200366 0.01082635 0.01082635
0.01086259 0.01103044 0.01095486 0.01122904]
mean value: 0.011547422409057618
key: score_time
value: [0.00997329 0.00975752 0.00908828 0.00989461 0.00902319 0.00905466
0.00912499 0.00912809 0.00912166 0.00911331]
mean value: 0.009327960014343262
key: test_mcc
value: [0.38785122 0.36539907 0.43046947 0.43046947 0.34199134 0.42559698
0.34376305 0.38523946 0.6048462 0.71055169]
mean value: 0.44261779449865774
key: train_mcc
value: [0.50060055 0.53153318 0.48417992 0.49334101 0.49196558 0.50838861
0.4936543 0.46999696 0.50410801 0.4705211 ]
mean value: 0.49482892039129556
key: test_accuracy
value: [0.73684211 0.73684211 0.75 0.75 0.73684211 0.76315789
0.69736842 0.71052632 0.84 0.88 ]
mean value: 0.7601578947368421
key: train_accuracy
value: [0.78592375 0.80058651 0.77859238 0.78152493 0.78152493 0.7888563
0.7888563 0.74633431 0.79062958 0.77306003]
mean value: 0.7815889018174949
key: test_fscore
value: [0.56521739 0.54545455 0.59574468 0.59574468 0.52380952 0.59090909
0.54901961 0.57692308 0.71428571 0.79069767]
mean value: 0.6047805986650169
key: train_fscore
value: [0.64563107 0.66666667 0.63438257 0.64096386 0.63922518 0.65048544
0.63819095 0.62634989 0.64691358 0.62469734]
mean value: 0.6413506538717907
key: test_precision
value: [0.5 0.5 0.51851852 0.51851852 0.52380952 0.56521739
0.46666667 0.48387097 0.68181818 0.73913043]
mean value: 0.5497550203160301
key: train_precision
value: [0.58333333 0.60714286 0.5720524 0.57575758 0.57391304 0.58515284
0.59069767 0.51785714 0.59276018 0.56331878]
mean value: 0.5761985825450499
key: test_recall
value: [0.65 0.6 0.7 0.7 0.52380952 0.61904762
0.66666667 0.71428571 0.75 0.85 ]
mean value: 0.6773809523809524
key: train_recall
value: [0.72282609 0.73913043 0.71195652 0.72282609 0.72131148 0.73224044
0.69398907 0.79234973 0.71195652 0.70108696]
mean value: 0.7249673319078166
key: test_roc_auc
value: [0.70892857 0.69285714 0.73392857 0.73392857 0.67099567 0.71861472
0.68787879 0.71168831 0.81136364 0.87045455]
mean value: 0.7340638528138528
key: train_roc_auc
value: [0.76603152 0.7812118 0.75758469 0.76301947 0.76245934 0.77092984
0.75881818 0.76090432 0.7657979 0.75034308]
mean value: 0.7637100142327954
key: test_jcc
value: [0.39393939 0.375 0.42424242 0.42424242 0.35483871 0.41935484
0.37837838 0.40540541 0.55555556 0.65384615]
mean value: 0.43848032839968326
key: train_jcc
value: [0.47670251 0.5 0.46453901 0.47163121 0.46975089 0.48201439
0.46863469 0.45597484 0.47810219 0.45422535]
mean value: 0.4721575070903312
MCC on Blind test: 0.41
Accuracy on Blind test: 0.75
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01117373 0.01130366 0.01122308 0.0110395 0.01203394 0.01106477
0.01097441 0.01187778 0.01183271 0.01108146]
mean value: 0.011360502243041993
key: score_time
value: [0.00908971 0.0091331 0.00910974 0.00899458 0.00902104 0.00903058
0.00908709 0.00987816 0.00898623 0.00970459]
mean value: 0.009203481674194335
key: test_mcc
value: [0.42968224 0.45714286 0.53968028 0.54096275 0.43257867 0.38416102
0.52663543 0.37442392 0.52764485 0.4955746 ]
mean value: 0.4708486628265047
key: train_mcc
value: [0.54291625 0.53559377 0.53542123 0.56534397 0.53524411 0.55345132
0.5311225 0.54923819 0.54615584 0.52355185]
mean value: 0.5418039021364769
key: test_accuracy
value: [0.76315789 0.78947368 0.82894737 0.81578947 0.77631579 0.75
0.81578947 0.76315789 0.82666667 0.81333333]
mean value: 0.7942631578947369
key: train_accuracy
value: [0.82111437 0.82111437 0.81818182 0.82844575 0.81964809 0.82404692
0.81524927 0.82111437 0.82284041 0.81551977]
mean value: 0.8207275131707192
key: test_fscore
value: [0.59090909 0.6 0.64864865 0.66666667 0.58536585 0.55813953
0.65 0.52631579 0.62857143 0.61111111]
mean value: 0.6065728123922888
key: train_fscore
value: [0.66483516 0.65536723 0.65934066 0.68292683 0.65738162 0.67391304
0.6576087 0.67204301 0.66666667 0.64804469]
mean value: 0.663812760996864
key: test_precision
value: [0.54166667 0.6 0.70588235 0.63636364 0.6 0.54545455
0.68421053 0.58823529 0.73333333 0.6875 ]
mean value: 0.6322646355192795
key: train_precision
value: [0.67222222 0.68235294 0.66666667 0.68108108 0.67045455 0.67027027
0.65405405 0.66137566 0.67597765 0.66666667]
mean value: 0.6701121762598923
key: test_recall
value: [0.65 0.6 0.6 0.7 0.57142857 0.57142857
0.61904762 0.47619048 0.55 0.55 ]
mean value: 0.5888095238095238
key: train_recall
value: [0.6576087 0.63043478 0.65217391 0.68478261 0.64480874 0.67759563
0.66120219 0.68306011 0.6576087 0.63043478]
mean value: 0.6579710144927536
key: test_roc_auc
value: [0.72678571 0.72857143 0.75535714 0.77857143 0.71298701 0.69480519
0.75497835 0.67445887 0.73863636 0.72954545]
mean value: 0.729469696969697
key: train_roc_auc
value: [0.7695674 0.76100052 0.76584599 0.78315436 0.76428814 0.77767557
0.76647284 0.7774018 0.77068812 0.75710116]
mean value: 0.7693195890646318
key: test_jcc
value: [0.41935484 0.42857143 0.48 0.5 0.4137931 0.38709677
0.48148148 0.35714286 0.45833333 0.44 ]
mean value: 0.4365773816880602
key: train_jcc
value: [0.49794239 0.48739496 0.49180328 0.51851852 0.48962656 0.50819672
0.48987854 0.50607287 0.5 0.47933884]
mean value: 0.496877267932884
MCC on Blind test: 0.47
Accuracy on Blind test: 0.8
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01400113 0.01252866 0.0119729 0.01145363 0.01141572 0.01111579
0.01188993 0.01115608 0.01220417 0.01183653]
mean value: 0.011957454681396484
key: score_time
value: [0.08364654 0.01389098 0.01768208 0.01454377 0.01357603 0.01330209
0.01355171 0.01357627 0.0139761 0.0141139 ]
mean value: 0.021185946464538575
key: test_mcc
value: [0.18448201 0.47873298 0.36335261 0.44019762 0.28501393 0.23769831
0.3688017 0.3688017 0.48492277 0.37787109]
mean value: 0.3589874726318814
key: train_mcc
value: [0.55081621 0.55081621 0.53338855 0.5865119 0.52088082 0.55352855
0.56690744 0.5531166 0.5917446 0.5373988 ]
mean value: 0.5545109680698154
key: test_accuracy
value: [0.73684211 0.81578947 0.77631579 0.78947368 0.73684211 0.72368421
0.77631579 0.77631579 0.81333333 0.78666667]
mean value: 0.773157894736842
key: train_accuracy
value: [0.83577713 0.83577713 0.82991202 0.84750733 0.82697947 0.8372434
0.84164223 0.8372434 0.84919473 0.83162518]
mean value: 0.8372902023589219
key: test_fscore
value: [0.28571429 0.5625 0.48484848 0.57894737 0.44444444 0.4
0.4516129 0.4516129 0.58823529 0.38461538]
mean value: 0.46325310686129123
key: train_fscore
value: [0.61111111 0.61111111 0.61073826 0.65100671 0.58741259 0.62626263
0.63758389 0.6185567 0.64604811 0.60750853]
mean value: 0.620733963837761
key: test_precision
value: [0.5 0.75 0.61538462 0.61111111 0.53333333 0.5
0.7 0.7 0.71428571 0.83333333]
mean value: 0.6457448107448107
key: train_precision
value: [0.84615385 0.84615385 0.79824561 0.85087719 0.81553398 0.81578947
0.82608696 0.83333333 0.87850467 0.81651376]
mean value: 0.832719267781213
key: test_recall
value: [0.2 0.45 0.4 0.55 0.38095238 0.33333333
0.33333333 0.33333333 0.5 0.25 ]
mean value: 0.3730952380952381
key: train_recall
value: [0.47826087 0.47826087 0.49456522 0.52717391 0.45901639 0.50819672
0.51912568 0.49180328 0.51086957 0.48369565]
mean value: 0.4950968163459254
key: test_roc_auc
value: [0.56428571 0.69821429 0.65535714 0.7125 0.62683983 0.6030303
0.63939394 0.63939394 0.71363636 0.61590909]
mean value: 0.6468560606060606
key: train_roc_auc
value: [0.72306618 0.72306618 0.72419024 0.74651868 0.71047012 0.73305628
0.73952276 0.72786557 0.74240873 0.72180775]
mean value: 0.7291972480213341
key: test_jcc
value: [0.16666667 0.39130435 0.32 0.40740741 0.28571429 0.25
0.29166667 0.29166667 0.41666667 0.23809524]
mean value: 0.3059187945709685
key: train_jcc
value: [0.44 0.44 0.43961353 0.48258706 0.41584158 0.45588235
0.4679803 0.44776119 0.47715736 0.43627451]
mean value: 0.45030978881526235
MCC on Blind test: 0.34
Accuracy on Blind test: 0.77
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03588939 0.03526592 0.03247428 0.03409171 0.03312564 0.03497338
0.03531742 0.02957702 0.03257608 0.03178263]
mean value: 0.033507347106933594
key: score_time
value: [0.01449561 0.01631856 0.01578045 0.01557374 0.01439095 0.01470923
0.01556349 0.01594019 0.0141511 0.01432633]
mean value: 0.015124964714050292
key: test_mcc
value: [0.55205245 0.45439995 0.52947472 0.55205245 0.41708468 0.30381296
0.37442392 0.58625681 0.75377836 0.60004605]
mean value: 0.5123382350291168
key: train_mcc
value: [0.66985139 0.68357391 0.67042724 0.66985139 0.64128448 0.70395629
0.69975956 0.70918624 0.65449684 0.67530917]
mean value: 0.6777696518924042
key: test_accuracy
value: [0.82894737 0.80263158 0.82894737 0.82894737 0.77631579 0.73684211
0.76315789 0.84210526 0.90666667 0.85333333]
mean value: 0.8167894736842105
key: train_accuracy
value: [0.87390029 0.8797654 0.87536657 0.87390029 0.86656891 0.88709677
0.8856305 0.88709677 0.8682284 0.87701318]
mean value: 0.8774567094455632
key: test_fscore
value: [0.66666667 0.57142857 0.62857143 0.66666667 0.56410256 0.47368421
0.52631579 0.68421053 0.8 0.66666667]
mean value: 0.6248313090418354
key: train_fscore
value: [0.75144509 0.75882353 0.74626866 0.75144509 0.70926518 0.77681159
0.77325581 0.78551532 0.73988439 0.75147929]
mean value: 0.7544193946752498
key: test_precision
value: [0.68421053 0.66666667 0.73333333 0.68421053 0.61111111 0.52941176
0.58823529 0.76470588 0.93333333 0.84615385]
mean value: 0.704137228440634
key: train_precision
value: [0.80246914 0.82692308 0.82781457 0.80246914 0.85384615 0.82716049
0.82608696 0.80113636 0.79012346 0.82467532]
mean value: 0.8182704667361305
key: test_recall
value: [0.65 0.5 0.55 0.65 0.52380952 0.42857143
0.47619048 0.61904762 0.7 0.55 ]
mean value: 0.5647619047619048
key: train_recall
value: [0.70652174 0.70108696 0.67934783 0.70652174 0.60655738 0.73224044
0.72677596 0.7704918 0.69565217 0.69021739]
mean value: 0.7015413399857449
key: test_roc_auc
value: [0.77142857 0.70535714 0.73928571 0.77142857 0.6982684 0.64155844
0.67445887 0.77316017 0.84090909 0.75681818]
mean value: 0.737267316017316
key: train_roc_auc
value: [0.82113236 0.82343504 0.8135695 0.82113236 0.78424061 0.83806411
0.83533187 0.85017576 0.81375795 0.81805459]
mean value: 0.821889413503991
key: test_jcc
value: [0.5 0.4 0.45833333 0.5 0.39285714 0.31034483
0.35714286 0.52 0.66666667 0.5 ]
mean value: 0.4605344827586207
key: train_jcc
value: [0.60185185 0.61137441 0.5952381 0.60185185 0.54950495 0.63507109
0.63033175 0.64678899 0.58715596 0.60189573]
mean value: 0.606106468934728
MCC on Blind test: 0.51
Accuracy on Blind test: 0.81
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.66347837 2.15114832 2.75264692 2.73905396 2.1834085 2.67941236
2.28768206 1.77603769 2.09980702 2.73488927]
mean value: 2.4067564487457274
key: score_time
value: [0.01294589 0.01303411 0.01551843 0.01541853 0.01326132 0.01550579
0.01292658 0.012887 0.01287413 0.01300073]
mean value: 0.013737249374389648
key: test_mcc
value: [0.44019762 0.55205245 0.68309183 0.68681493 0.64368314 0.4026607
0.45868247 0.48964721 0.52764485 0.64277498]
mean value: 0.5527250178953496
key: train_mcc
value: [0.97389659 0.90735013 0.96263725 0.97767156 0.93987894 0.97002758
0.95873491 0.89047817 0.90983387 0.97398605]
mean value: 0.9464495043164388
key: test_accuracy
value: [0.78947368 0.82894737 0.88157895 0.86842105 0.85526316 0.77631579
0.78947368 0.78947368 0.82666667 0.86666667]
mean value: 0.8272280701754386
key: train_accuracy
value: [0.98973607 0.96187683 0.98533724 0.99120235 0.97653959 0.98826979
0.98387097 0.95601173 0.96486091 0.9897511 ]
mean value: 0.9787456580636574
key: test_fscore
value: [0.57894737 0.66666667 0.75675676 0.77272727 0.74418605 0.54054054
0.6 0.63636364 0.62857143 0.72222222]
mean value: 0.6646981938781205
key: train_fscore
value: [0.98071625 0.93229167 0.97252747 0.98369565 0.95505618 0.97790055
0.96952909 0.92021277 0.93220339 0.98060942]
mean value: 0.9604742437016127
key: test_precision
value: [0.61111111 0.68421053 0.82352941 0.70833333 0.72727273 0.625
0.63157895 0.60869565 0.73333333 0.8125 ]
mean value: 0.6965565042673334
key: train_precision
value: [0.99441341 0.895 0.98333333 0.98369565 0.98265896 0.98882682
0.98314607 0.89637306 0.97058824 1. ]
mean value: 0.9678035528213172
key: test_recall
value: [0.55 0.65 0.7 0.85 0.76190476 0.47619048
0.57142857 0.66666667 0.55 0.65 ]
mean value: 0.6426190476190476
key: train_recall
value: [0.9673913 0.97282609 0.96195652 0.98369565 0.92896175 0.96721311
0.95628415 0.94535519 0.89673913 0.96195652]
mean value: 0.9542379425041577
key: test_roc_auc
value: [0.7125 0.77142857 0.82321429 0.8625 0.82640693 0.68354978
0.72207792 0.75151515 0.73863636 0.79772727]
mean value: 0.7689556277056278
key: train_roc_auc
value: [0.98269164 0.96532871 0.97796621 0.98883578 0.96147486 0.98160255
0.97513606 0.95263752 0.94335955 0.98097826]
mean value: 0.9710011130457062
key: test_jcc
value: [0.40740741 0.5 0.60869565 0.62962963 0.59259259 0.37037037
0.42857143 0.46666667 0.45833333 0.56521739]
mean value: 0.5027484472049689
key: train_jcc
value: [0.96216216 0.87317073 0.94652406 0.96791444 0.91397849 0.95675676
0.94086022 0.85221675 0.87301587 0.96195652]
mean value: 0.9248556006500929
MCC on Blind test: 0.58
Accuracy on Blind test: 0.83
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03639221 0.03556943 0.02711511 0.0274179 0.02753472 0.03158092
0.03024459 0.03069091 0.03130579 0.0289371 ]
mean value: 0.030678868293762207
key: score_time
value: [0.01144648 0.00936389 0.00901127 0.00917649 0.00972128 0.00914931
0.00938988 0.00916195 0.00979638 0.00974703]
mean value: 0.00959639549255371
key: test_mcc
value: [0.82650337 0.57092239 0.76668414 0.79161589 0.53939394 0.67099567
0.64368314 0.69392691 0.75376307 0.6983799 ]
mean value: 0.69558684338617
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93421053 0.84210526 0.90789474 0.92105263 0.81578947 0.86842105
0.85526316 0.88157895 0.90666667 0.88 ]
mean value: 0.881298245614035
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86486486 0.66666667 0.82926829 0.84210526 0.66666667 0.76190476
0.74418605 0.76923077 0.81081081 0.7804878 ]
mean value: 0.7736191947375038
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94117647 0.75 0.80952381 0.88888889 0.66666667 0.76190476
0.72727273 0.83333333 0.88235294 0.76190476]
mean value: 0.8023024361259655
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.6 0.85 0.8 0.66666667 0.76190476
0.76190476 0.71428571 0.75 0.8 ]
mean value: 0.7504761904761905
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.89107143 0.76428571 0.88928571 0.88214286 0.76969697 0.83549784
0.82640693 0.82987013 0.85681818 0.85454545]
mean value: 0.8399621212121212
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76190476 0.5 0.70833333 0.72727273 0.5 0.61538462
0.59259259 0.625 0.68181818 0.64 ]
mean value: 0.6352306212306212
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.7
Accuracy on Blind test: 0.88
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.15146351 0.15494537 0.14572144 0.14574671 0.14581585 0.14503813
0.14534903 0.1439271 0.14520812 0.14628482]
mean value: 0.14695000648498535
key: score_time
value: [0.01862741 0.01811218 0.01806188 0.01808953 0.01812577 0.01807761
0.0182085 0.01812768 0.01813745 0.01835513]
mean value: 0.018192315101623537
key: test_mcc
value: [0.43358045 0.61138605 0.6409855 0.4976283 0.53939394 0.61721663
0.49939976 0.51564585 0.71637516 0.60302269]
mean value: 0.5674634342197126
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.80263158 0.85526316 0.86842105 0.81578947 0.81578947 0.85526316
0.80263158 0.81578947 0.89333333 0.85333333]
mean value: 0.8378245614035088
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.51612903 0.7027027 0.70588235 0.61111111 0.66666667 0.68571429
0.63414634 0.63157895 0.77777778 0.68571429]
mean value: 0.6617423503717906
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.72727273 0.76470588 0.85714286 0.6875 0.66666667 0.85714286
0.65 0.70588235 0.875 0.8 ]
mean value: 0.7591313343519226
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.4 0.65 0.6 0.55 0.66666667 0.57142857
0.61904762 0.57142857 0.7 0.6 ]
mean value: 0.5928571428571429
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.67321429 0.78928571 0.78214286 0.73035714 0.76969697 0.76753247
0.74588745 0.74025974 0.83181818 0.77272727]
mean value: 0.7602922077922077
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.34782609 0.54166667 0.54545455 0.44 0.5 0.52173913
0.46428571 0.46153846 0.63636364 0.52173913]
mean value: 0.4980613372135111
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.5
Accuracy on Blind test: 0.81
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0110414 0.01136756 0.01222539 0.0108819 0.01099682 0.01106071
0.01090479 0.0108161 0.01123643 0.01108217]
mean value: 0.011161327362060547
key: score_time
value: [0.00881791 0.00877786 0.00997138 0.00875568 0.00880456 0.00875354
0.00876141 0.00882554 0.0088098 0.0087533 ]
mean value: 0.008903098106384278
key: test_mcc
value: [0.27602622 0.45187994 0.49939976 0.45714286 0.54677939 0.18613561
0.24759308 0.36154674 0.22613351 0.18181818]
mean value: 0.34344552857203775
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.72368421 0.77631579 0.80263158 0.78947368 0.80263158 0.69736842
0.73684211 0.73684211 0.72 0.68 ]
mean value: 0.746578947368421
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.46153846 0.60465116 0.63414634 0.6 0.68085106 0.37837838
0.375 0.54545455 0.4 0.4 ]
mean value: 0.5080019953455285
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.47368421 0.56521739 0.61904762 0.6 0.61538462 0.4375
0.54545455 0.52173913 0.46666667 0.4 ]
mean value: 0.5244694178818893
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.45 0.65 0.65 0.6 0.76190476 0.33333333
0.28571429 0.57142857 0.35 0.4 ]
mean value: 0.5052380952380953
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.63571429 0.73571429 0.75357143 0.72857143 0.79004329 0.58484848
0.5974026 0.68571429 0.60227273 0.59090909]
mean value: 0.6704761904761904
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.3 0.43333333 0.46428571 0.42857143 0.51612903 0.23333333
0.23076923 0.375 0.25 0.25 ]
mean value: 0.34814220725511047
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.38
Accuracy on Blind test: 0.75
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.25220633 2.30483413 2.25472188 2.22439718 2.26833797 2.24337077
2.26458836 2.2349503 2.25734282 2.28761482]
mean value: 2.259236454963684
key: score_time
value: [0.1039598 0.09403181 0.09515738 0.09428215 0.09438896 0.09375715
0.0979569 0.0937984 0.10010552 0.09398079]
mean value: 0.0961418867111206
key: test_mcc
value: [0.6409855 0.65104858 0.71751058 0.75907212 0.56622086 0.66254135
0.69986305 0.76353586 0.82577865 0.72009768]
mean value: 0.7006654225859422
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86842105 0.86842105 0.89473684 0.90789474 0.82894737 0.86842105
0.88157895 0.90789474 0.93333333 0.89333333]
mean value: 0.8852982456140351
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.70588235 0.73684211 0.77777778 0.82051282 0.68292683 0.75
0.7804878 0.81081081 0.86486486 0.78947368]
mean value: 0.7719579050527475
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.77777778 0.875 0.84210526 0.7 0.78947368
0.8 0.9375 0.94117647 0.83333333]
mean value: 0.8353509386210625
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6 0.7 0.7 0.8 0.66666667 0.71428571
0.76190476 0.71428571 0.8 0.75 ]
mean value: 0.7207142857142858
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78214286 0.81428571 0.83214286 0.87321429 0.77878788 0.82077922
0.84458874 0.84805195 0.89090909 0.84772727]
mean value: 0.8332629870129871
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.54545455 0.58333333 0.63636364 0.69565217 0.51851852 0.6
0.64 0.68181818 0.76190476 0.65217391]
mean value: 0.6315219064349499
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.67
Accuracy on Blind test: 0.87
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.94173408 1.01432633 1.08325291 1.04413033 1.01967764 1.01618648
1.04146981 1.02105212 1.05572248 1.04665756]
mean value: 1.1284209728240966
key: score_time
value: [0.27791572 0.25200295 0.27757406 0.13488817 0.25813341 0.24319124
0.25587392 0.16115642 0.27250767 0.15622592]
mean value: 0.22894694805145263
key: test_mcc
value: [0.6409855 0.68309183 0.71751058 0.68309183 0.52663543 0.69392691
0.70856367 0.72858509 0.71706665 0.64277498]
mean value: 0.6742232454814421
key: train_mcc
value: [0.92493762 0.93257835 0.92914668 0.94018532 0.91346549 0.93233511
0.94372729 0.93610722 0.93273198 0.9138245 ]
mean value: 0.9299039551930306
key: test_accuracy
value: [0.86842105 0.88157895 0.89473684 0.88157895 0.81578947 0.88157895
0.88157895 0.89473684 0.89333333 0.86666667]
mean value: 0.876
key: train_accuracy
value: [0.97067449 0.97360704 0.97214076 0.97653959 0.96627566 0.97360704
0.97800587 0.97507331 0.97364568 0.96632504]
mean value: 0.9725894471088823
key: test_fscore
value: [0.70588235 0.75675676 0.77777778 0.75675676 0.65 0.76923077
0.79069767 0.77777778 0.76470588 0.72222222]
mean value: 0.7471807970234783
key: train_fscore
value: [0.94413408 0.9494382 0.94586895 0.95505618 0.93409742 0.94915254
0.95774648 0.95211268 0.94915254 0.93447293]
mean value: 0.9471232001455421
key: test_precision
value: [0.85714286 0.82352941 0.875 0.82352941 0.68421053 0.83333333
0.77272727 0.93333333 0.92857143 0.8125 ]
mean value: 0.8343877574953427
key: train_precision
value: [0.97126437 0.98255814 0.99401198 0.98837209 0.98192771 0.98245614
0.98837209 0.98255814 0.98823529 0.98203593]
mean value: 0.9841791882435885
key: test_recall
value: [0.6 0.7 0.7 0.7 0.61904762 0.71428571
0.80952381 0.66666667 0.65 0.65 ]
mean value: 0.680952380952381
key: train_recall
value: [0.91847826 0.91847826 0.90217391 0.92391304 0.89071038 0.91803279
0.92896175 0.92349727 0.91304348 0.89130435]
mean value: 0.9128593490140176
key: test_roc_auc
value: [0.78214286 0.82321429 0.83214286 0.82321429 0.75497835 0.82987013
0.85930736 0.82424242 0.81590909 0.79772727]
mean value: 0.8142748917748918
key: train_roc_auc
value: [0.95421905 0.95622708 0.95008294 0.95994849 0.94234918 0.95601038
0.96247687 0.95874262 0.95451773 0.94264616]
mean value: 0.9537220504235004
key: test_jcc
value: [0.54545455 0.60869565 0.63636364 0.60869565 0.48148148 0.625
0.65384615 0.63636364 0.61904762 0.56521739]
mean value: 0.5980165768209247
key: train_jcc
value: [0.89417989 0.90374332 0.8972973 0.91397849 0.87634409 0.90322581
0.91891892 0.90860215 0.90322581 0.87700535]
mean value: 0.8996521117583736
MCC on Blind test: 0.67
Accuracy on Blind test: 0.87
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0255971 0.01135564 0.01114917 0.01224518 0.01257515 0.01233673
0.01233101 0.01187515 0.01254964 0.01140475]
mean value: 0.013341951370239257
key: score_time
value: [0.00925922 0.00919247 0.00999546 0.00957441 0.00994539 0.0099535
0.00913477 0.00995517 0.0098753 0.00904942]
mean value: 0.009593510627746582
key: test_mcc
value: [0.42968224 0.45714286 0.53968028 0.54096275 0.43257867 0.38416102
0.52663543 0.37442392 0.52764485 0.4955746 ]
mean value: 0.4708486628265047
key: train_mcc
value: [0.54291625 0.53559377 0.53542123 0.56534397 0.53524411 0.55345132
0.5311225 0.54923819 0.54615584 0.52355185]
mean value: 0.5418039021364769
key: test_accuracy
value: [0.76315789 0.78947368 0.82894737 0.81578947 0.77631579 0.75
0.81578947 0.76315789 0.82666667 0.81333333]
mean value: 0.7942631578947369
key: train_accuracy
value: [0.82111437 0.82111437 0.81818182 0.82844575 0.81964809 0.82404692
0.81524927 0.82111437 0.82284041 0.81551977]
mean value: 0.8207275131707192
key: test_fscore
value: [0.59090909 0.6 0.64864865 0.66666667 0.58536585 0.55813953
0.65 0.52631579 0.62857143 0.61111111]
mean value: 0.6065728123922888
key: train_fscore
value: [0.66483516 0.65536723 0.65934066 0.68292683 0.65738162 0.67391304
0.6576087 0.67204301 0.66666667 0.64804469]
mean value: 0.663812760996864
key: test_precision
value: [0.54166667 0.6 0.70588235 0.63636364 0.6 0.54545455
0.68421053 0.58823529 0.73333333 0.6875 ]
mean value: 0.6322646355192795
key: train_precision
value: [0.67222222 0.68235294 0.66666667 0.68108108 0.67045455 0.67027027
0.65405405 0.66137566 0.67597765 0.66666667]
mean value: 0.6701121762598923
key: test_recall
value: [0.65 0.6 0.6 0.7 0.57142857 0.57142857
0.61904762 0.47619048 0.55 0.55 ]
mean value: 0.5888095238095238
key: train_recall
value: [0.6576087 0.63043478 0.65217391 0.68478261 0.64480874 0.67759563
0.66120219 0.68306011 0.6576087 0.63043478]
mean value: 0.6579710144927536
key: test_roc_auc
value: [0.72678571 0.72857143 0.75535714 0.77857143 0.71298701 0.69480519
0.75497835 0.67445887 0.73863636 0.72954545]
mean value: 0.729469696969697
key: train_roc_auc
value: [0.7695674 0.76100052 0.76584599 0.78315436 0.76428814 0.77767557
0.76647284 0.7774018 0.77068812 0.75710116]
mean value: 0.7693195890646318
key: test_jcc
value: [0.41935484 0.42857143 0.48 0.5 0.4137931 0.38709677
0.48148148 0.35714286 0.45833333 0.44 ]
mean value: 0.4365773816880602
key: train_jcc
value: [0.49794239 0.48739496 0.49180328 0.51851852 0.48962656 0.50819672
0.48987854 0.50607287 0.5 0.47933884]
mean value: 0.496877267932884
MCC on Blind test: 0.47
Accuracy on Blind test: 0.8
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.13928175 0.11773467 0.10438395 0.10354257 0.10556793 0.10095
0.10506415 0.10373998 0.11169815 0.11128283]
mean value: 0.11032459735870362
key: score_time
value: [0.01147556 0.01137829 0.0112083 0.01126432 0.01129389 0.01115394
0.01117277 0.01144838 0.01209068 0.01141405]
mean value: 0.011390018463134765
key: test_mcc
value: [0.66071429 0.77709656 0.82807867 0.83350524 0.60519481 0.75730256
0.7734442 0.73049431 0.82728639 0.89983564]
mean value: 0.769295265668651
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86842105 0.90789474 0.93421053 0.93421053 0.84210526 0.89473684
0.90789474 0.89473684 0.93333333 0.96 ]
mean value: 0.9077543859649123
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.8372093 0.87179487 0.87804878 0.71428571 0.82608696
0.8372093 0.8 0.87179487 0.92682927]
mean value: 0.8313259067828848
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.7826087 0.89473684 0.85714286 0.71428571 0.76
0.81818182 0.84210526 0.89473684 0.9047619 ]
mean value: 0.821855993739289
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.9 0.85 0.9 0.71428571 0.9047619
0.85714286 0.76190476 0.85 0.95 ]
mean value: 0.8438095238095238
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83035714 0.90535714 0.90714286 0.92321429 0.8025974 0.8978355
0.89220779 0.85367965 0.90681818 0.95681818]
mean value: 0.8876028138528138
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.72 0.77272727 0.7826087 0.55555556 0.7037037
0.72 0.66666667 0.77272727 0.86363636]
mean value: 0.7157625530669008
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05319357 0.09460664 0.08337164 0.08630013 0.06278348 0.07761669
0.06303596 0.06139302 0.0718267 0.0702157 ]
mean value: 0.07243435382843018
key: score_time
value: [0.01877332 0.01906776 0.02187419 0.02077937 0.01742435 0.01251388
0.02134514 0.01239467 0.01223898 0.02151847]
mean value: 0.01779301166534424
key: test_mcc
value: [0.49396542 0.41403934 0.58076493 0.64700991 0.55369745 0.29893648
0.45868247 0.52663543 0.50830425 0.68174749]
mean value: 0.5163783158687285
key: train_mcc
value: [0.77596943 0.75954937 0.77050634 0.76515426 0.78951424 0.77260991
0.76769444 0.76847684 0.74055285 0.77000441]
mean value: 0.7680032094125958
key: test_accuracy
value: [0.78947368 0.77631579 0.84210526 0.85526316 0.81578947 0.72368421
0.78947368 0.81578947 0.81333333 0.88 ]
mean value: 0.8101228070175439
key: train_accuracy
value: [0.91202346 0.90615836 0.91055718 0.90762463 0.91788856 0.91055718
0.90909091 0.90909091 0.89751098 0.91068814]
mean value: 0.9091190323868735
key: test_fscore
value: [0.63636364 0.56410256 0.68421053 0.74418605 0.68181818 0.48780488
0.6 0.65 0.63157895 0.75675676]
mean value: 0.6436821537285758
key: train_fscore
value: [0.83606557 0.82320442 0.83102493 0.82833787 0.84530387 0.83378747
0.82967033 0.83060109 0.81081081 0.83008357]
mean value: 0.8298889931247613
key: test_precision
value: [0.58333333 0.57894737 0.72222222 0.69565217 0.65217391 0.5
0.63157895 0.68421053 0.66666667 0.82352941]
mean value: 0.6538314563048713
key: train_precision
value: [0.84065934 0.83707865 0.84745763 0.83060109 0.8547486 0.83152174
0.83425414 0.83060109 0.80645161 0.85142857]
mean value: 0.8364802475716324
key: test_recall
value: [0.7 0.55 0.65 0.8 0.71428571 0.47619048
0.57142857 0.61904762 0.6 0.7 ]
mean value: 0.638095238095238
key: train_recall
value: [0.83152174 0.80978261 0.81521739 0.82608696 0.83606557 0.83606557
0.82513661 0.83060109 0.81521739 0.80978261]
mean value: 0.8235477548111191
key: test_roc_auc
value: [0.76071429 0.70357143 0.78035714 0.8375 0.78441558 0.64718615
0.72207792 0.75497835 0.74545455 0.82272727]
mean value: 0.7558982683982683
key: train_roc_auc
value: [0.8866444 0.87577484 0.88050026 0.88191898 0.89198068 0.88697066
0.88250819 0.88423842 0.87153655 0.8788392 ]
mean value: 0.8820912189158894
key: test_jcc
value: [0.46666667 0.39285714 0.52 0.59259259 0.51724138 0.32258065
0.42857143 0.48148148 0.46153846 0.60869565]
mean value: 0.4792225450353322
key: train_jcc
value: [0.71830986 0.69953052 0.71090047 0.70697674 0.73205742 0.71495327
0.70892019 0.71028037 0.68181818 0.70952381]
mean value: 0.7093270833969725
MCC on Blind test: 0.58
Accuracy on Blind test: 0.84
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01471019 0.01353598 0.01168084 0.01050878 0.0105772 0.01063776
0.01066518 0.01089907 0.01060247 0.01091695]
mean value: 0.011473441123962402
key: score_time
value: [0.0122757 0.0103693 0.00927901 0.00910711 0.0088954 0.0088973
0.00879335 0.00883627 0.00886154 0.00904274]
mean value: 0.009435772895812988
key: test_mcc
value: [0.43257867 0.56622086 0.621059 0.45187994 0.59458839 0.45868247
0.49939976 0.52663543 0.6048462 0.68174749]
mean value: 0.5437638210554387
key: train_mcc
value: [0.57575956 0.57012347 0.55191097 0.55790628 0.56084927 0.56982156
0.56533933 0.56234007 0.55668873 0.52682877]
mean value: 0.5597568021362611
key: test_accuracy
value: [0.77631579 0.82894737 0.85526316 0.77631579 0.84210526 0.78947368
0.80263158 0.81578947 0.84 0.88 ]
mean value: 0.8206842105263158
key: train_accuracy
value: [0.83284457 0.83284457 0.82404692 0.82697947 0.82844575 0.8313783
0.82991202 0.82844575 0.8272328 0.81405564]
mean value: 0.8276185794085951
key: test_fscore
value: [0.58536585 0.68292683 0.71794872 0.60465116 0.7 0.6
0.63414634 0.65 0.71428571 0.75675676]
mean value: 0.664608137617213
key: train_fscore
value: [0.69021739 0.68333333 0.67213115 0.67582418 0.67768595 0.68493151
0.68131868 0.67945205 0.67403315 0.65395095]
mean value: 0.6772878344228326
key: test_precision
value: [0.57142857 0.66666667 0.73684211 0.56521739 0.73684211 0.63157895
0.65 0.68421053 0.68181818 0.82352941]
mean value: 0.6748133907192999
key: train_precision
value: [0.69021739 0.69886364 0.67582418 0.68333333 0.68333333 0.68681319
0.68508287 0.68131868 0.68539326 0.6557377 ]
mean value: 0.6825917574563871
key: test_recall
value: [0.6 0.7 0.7 0.65 0.66666667 0.57142857
0.61904762 0.61904762 0.75 0.7 ]
mean value: 0.6576190476190475
key: train_recall
value: [0.69021739 0.66847826 0.66847826 0.66847826 0.67213115 0.68306011
0.67759563 0.67759563 0.66304348 0.65217391]
mean value: 0.6721252078878593
key: test_roc_auc
value: [0.71964286 0.7875 0.80535714 0.73571429 0.78787879 0.72207792
0.74588745 0.75497835 0.81136364 0.82272727]
mean value: 0.7693127705627706
key: train_roc_auc
value: [0.78787978 0.78102628 0.77500218 0.77701021 0.77895135 0.78441583
0.78168359 0.78068158 0.77540951 0.7629607 ]
mean value: 0.7785021014127629
key: test_jcc
value: [0.4137931 0.51851852 0.56 0.43333333 0.53846154 0.42857143
0.46428571 0.48148148 0.55555556 0.60869565]
mean value: 0.500269632582976
key: train_jcc
value: [0.52697095 0.51898734 0.50617284 0.51037344 0.5125 0.52083333
0.51666667 0.51452282 0.50833333 0.48582996]
mean value: 0.5121190694042841
MCC on Blind test: 0.48
Accuracy on Blind test: 0.8
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02312326 0.01958394 0.0229795 0.02434564 0.0250411 0.02593589
0.02374864 0.02295756 0.02466846 0.02031732]
mean value: 0.023270130157470703
key: score_time
value: [0.01271677 0.01243877 0.01255846 0.01221299 0.01256347 0.01247096
0.01330996 0.01226854 0.01219082 0.01226783]
mean value: 0.012499856948852538
key: test_mcc
value: [0.18576195 0.47809144 0.47657854 0.57009641 0.62473393 0.44503488
0. 0.49168478 0.55925894 0.64277498]
mean value: 0.44740158516400225
key: train_mcc
value: [0.22933185 0.55584409 0.46931111 0.6515961 0.68742226 0.74986385
0.21142669 0.6664289 0.65980274 0.66844332]
mean value: 0.554947091605564
key: test_accuracy
value: [0.75 0.71052632 0.81578947 0.76315789 0.81578947 0.78947368
0.72368421 0.73684211 0.84 0.86666667]
mean value: 0.7811929824561403
key: train_accuracy
value: [0.74926686 0.74193548 0.80791789 0.81818182 0.85043988 0.90029326
0.74780059 0.8255132 0.87262079 0.87115666]
mean value: 0.8185126426022851
key: test_fscore
value: [0.17391304 0.62068966 0.5 0.67857143 0.73076923 0.57894737
0. 0.64285714 0.625 0.72222222]
mean value: 0.5272970091491752
key: train_fscore
value: [0.1319797 0.66917293 0.46530612 0.74058577 0.77027027 0.81818182
0.11340206 0.74947368 0.72555205 0.75555556]
mean value: 0.5939479964816883
key: test_precision
value: [0.66666667 0.47368421 0.875 0.52777778 0.61290323 0.64705882
0. 0.51428571 0.83333333 0.8125 ]
mean value: 0.5963209751925671
key: train_precision
value: [1. 0.51149425 0.93442623 0.60204082 0.65517241 0.80104712
1. 0.60958904 0.86466165 0.77272727]
mean value: 0.7751158800878744
key: test_recall
value: [0.1 0.9 0.35 0.95 0.9047619 0.52380952
0. 0.85714286 0.5 0.65 ]
mean value: 0.5735714285714286
key: train_recall
value: [0.07065217 0.9673913 0.30978261 0.96195652 0.93442623 0.83606557
0.06010929 0.9726776 0.625 0.73913043]
mean value: 0.6477191732002852
key: test_roc_auc
value: [0.54107143 0.77142857 0.66607143 0.82321429 0.84329004 0.70735931
0.5 0.77402597 0.73181818 0.79772727]
mean value: 0.7156006493506493
key: train_roc_auc
value: [0.53532609 0.81301292 0.65087524 0.86350838 0.87703275 0.87995663
0.53005464 0.87211034 0.79446393 0.82948506]
mean value: 0.7645825988897821
key: test_jcc
value: [0.0952381 0.45 0.33333333 0.51351351 0.57575758 0.40740741
0. 0.47368421 0.45454545 0.56521739]
mean value: 0.3868696981626043
key: train_jcc
value: [0.07065217 0.50282486 0.30319149 0.58803987 0.62637363 0.69230769
0.06010929 0.5993266 0.56930693 0.60714286]
mean value: 0.4619275384602773
MCC on Blind test: 0.58
Accuracy on Blind test: 0.84
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02798748 0.0284133 0.03016829 0.03018427 0.03876376 0.03744674
0.05252314 0.02689528 0.03849101 0.02904224]
mean value: 0.033991551399230956
key: score_time
value: [0.01247096 0.01632237 0.0129602 0.01235032 0.01333857 0.0248065
0.01258373 0.01246619 0.01255035 0.0125165 ]
mean value: 0.01423656940460205
key: test_mcc
value: [0.55205245 0.59285714 0.56294295 0.69006556 0.31077631 0.31077631
0.52663543 0.54701077 0.48900965 0.62764591]
mean value: 0.520977247933603
key: train_mcc
value: [0.81526695 0.76569585 0.59161944 0.70559502 0.47781982 0.50535412
0.793226 0.70835718 0.58207667 0.63419533]
mean value: 0.6579206367061766
key: test_accuracy
value: [0.82894737 0.84210526 0.84210526 0.88157895 0.76315789 0.76315789
0.81578947 0.82894737 0.81333333 0.78666667]
mean value: 0.8165789473684211
key: train_accuracy
value: [0.92815249 0.90322581 0.84750733 0.88856305 0.81085044 0.81964809
0.91788856 0.89002933 0.84480234 0.80380673]
mean value: 0.8654474180238125
key: test_fscore
value: [0.66666667 0.7 0.6 0.76923077 0.30769231 0.30769231
0.65 0.64864865 0.46153846 0.71428571]
mean value: 0.5825754875754876
key: train_fscore
value: [0.86350975 0.83076923 0.62318841 0.76969697 0.4691358 0.50996016
0.84946237 0.7706422 0.61594203 0.72653061]
mean value: 0.7028837526055274
key: test_precision
value: [0.68421053 0.7 0.9 0.78947368 0.8 0.8
0.68421053 0.75 1. 0.55555556]
mean value: 0.7663450292397661
key: train_precision
value: [0.88571429 0.78640777 0.93478261 0.86986301 0.95 0.94117647
0.83597884 0.875 0.92391304 0.58169935]
mean value: 0.858453537154942
key: test_recall
value: [0.65 0.7 0.45 0.75 0.19047619 0.19047619
0.61904762 0.57142857 0.3 1. ]
mean value: 0.5421428571428571
key: train_recall
value: [0.8423913 0.88043478 0.4673913 0.69021739 0.31147541 0.34972678
0.86338798 0.68852459 0.46195652 0.9673913 ]
mean value: 0.6522897362794012
key: test_roc_auc
value: [0.77142857 0.79642857 0.71607143 0.83928571 0.58614719 0.58614719
0.75497835 0.74935065 0.65 0.85454545]
mean value: 0.7304383116883116
key: train_roc_auc
value: [0.90111533 0.89604068 0.72767156 0.82603239 0.65273169 0.67085537
0.90063186 0.82622622 0.72396423 0.85543914]
mean value: 0.7980708486147069
key: test_jcc
value: [0.5 0.53846154 0.42857143 0.625 0.18181818 0.18181818
0.48148148 0.48 0.3 0.55555556]
mean value: 0.42727063677063676
key: train_jcc
value: [0.75980392 0.71052632 0.45263158 0.62561576 0.30645161 0.34224599
0.73831776 0.62686567 0.44502618 0.57051282]
mean value: 0.5577997609234735
MCC on Blind test: 0.61
Accuracy on Blind test: 0.85
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.24267983 0.22689581 0.22911096 0.23088861 0.2304976 0.23003793
0.22936296 0.22639275 0.22763729 0.22939324]
mean value: 0.23028969764709473
key: score_time
value: [0.01547527 0.01565886 0.01581311 0.01586604 0.0156765 0.01581097
0.01544333 0.01569009 0.01573777 0.01564431]
mean value: 0.01568162441253662
key: test_mcc
value: [0.59285714 0.73862221 0.75907212 0.76668414 0.64368314 0.64368314
0.67099567 0.73049431 0.59090909 0.68863504]
mean value: 0.68256359998009
key: train_mcc
value: [0.93301467 0.91490177 0.94027897 0.93043955 0.92981722 0.94746341
0.92593033 0.9327836 0.92589187 0.93330543]
mean value: 0.9313826823116439
key: test_accuracy
value: [0.84210526 0.89473684 0.90789474 0.90789474 0.85526316 0.85526316
0.86842105 0.89473684 0.84 0.88 ]
mean value: 0.8746315789473684
key: train_accuracy
value: [0.97360704 0.96627566 0.97653959 0.97214076 0.97214076 0.97947214
0.97067449 0.97360704 0.97071742 0.97364568]
mean value: 0.9728820581959013
key: test_fscore
value: [0.7 0.80952381 0.82051282 0.82926829 0.74418605 0.74418605
0.76190476 0.8 0.7 0.76923077]
mean value: 0.7678812546878344
key: train_fscore
value: [0.95108696 0.93800539 0.95628415 0.94933333 0.94878706 0.96132597
0.94594595 0.95081967 0.94594595 0.95135135]
mean value: 0.9498885777915945
key: test_precision
value: [0.7 0.77272727 0.84210526 0.80952381 0.72727273 0.72727273
0.76190476 0.84210526 0.7 0.78947368]
mean value: 0.7672385509227614
key: train_precision
value: [0.95108696 0.93048128 0.96153846 0.93193717 0.93617021 0.97206704
0.93582888 0.95081967 0.94086022 0.94623656]
mean value: 0.9457026449459676
key: test_recall
value: [0.7 0.85 0.8 0.85 0.76190476 0.76190476
0.76190476 0.76190476 0.7 0.75 ]
mean value: 0.7697619047619048
key: train_recall
value: [0.95108696 0.94565217 0.95108696 0.9673913 0.96174863 0.95081967
0.95628415 0.95081967 0.95108696 0.95652174]
mean value: 0.9542498218104063
key: test_roc_auc
value: [0.79642857 0.88035714 0.87321429 0.88928571 0.82640693 0.82640693
0.83549784 0.85367965 0.79545455 0.83863636]
mean value: 0.8415367965367966
key: train_roc_auc
value: [0.96650733 0.95977388 0.96851537 0.97064344 0.96885027 0.97039982
0.96611803 0.9663918 0.96452143 0.96824083]
mean value: 0.9669962197880291
key: test_jcc
value: [0.53846154 0.68 0.69565217 0.70833333 0.59259259 0.59259259
0.61538462 0.66666667 0.53846154 0.625 ]
mean value: 0.6253145051405921
key: train_jcc
value: [0.90673575 0.88324873 0.91623037 0.9035533 0.9025641 0.92553191
0.8974359 0.90625 0.8974359 0.90721649]
mean value: 0.9046202455419211
MCC on Blind test: 0.77
Accuracy on Blind test: 0.91
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.09009361 0.10407043 0.09775949 0.11060524 0.10526705 0.10704541
0.05889678 0.09313583 0.11835146 0.11643147]
mean value: 0.10016567707061767
key: score_time
value: [0.0344286 0.02585244 0.0223434 0.03886867 0.02385998 0.02790022
0.02931643 0.03040266 0.03983021 0.02888322]
mean value: 0.030168581008911132
key: test_mcc
value: [0.72857143 0.8045087 0.79161589 0.79642857 0.66254135 0.70856367
0.67099567 0.73049431 0.79545455 0.79069549]
mean value: 0.747986962461053
key: train_mcc
value: [0.99255719 0.99260991 0.99255719 0.98140504 0.96257518 0.98133991
0.98133991 0.97754897 0.98509865 0.97770013]
mean value: 0.982473206049973
key: test_accuracy
value: [0.89473684 0.92105263 0.92105263 0.92105263 0.86842105 0.88157895
0.86842105 0.89473684 0.92 0.92 ]
mean value: 0.9011052631578947
key: train_accuracy
value: [0.99706745 0.99706745 0.99706745 0.99266862 0.98533724 0.99266862
0.99266862 0.99120235 0.99414348 0.99121523]
mean value: 0.9931106512153128
key: test_fscore
value: [0.8 0.85714286 0.84210526 0.85 0.75 0.79069767
0.76190476 0.8 0.85 0.84210526]
mean value: 0.8143955819782014
key: train_fscore
value: [0.99456522 0.99459459 0.99456522 0.9862259 0.97206704 0.98614958
0.98614958 0.98342541 0.98907104 0.98342541]
mean value: 0.987023899975587
key: test_precision
value: [0.8 0.81818182 0.88888889 0.85 0.78947368 0.77272727
0.76190476 0.84210526 0.85 0.88888889]
mean value: 0.8262170577960052
key: train_precision
value: [0.99456522 0.98924731 0.99456522 1. 0.99428571 1.
1. 0.99441341 0.99450549 1. ]
mean value: 0.9961582363223004
key: test_recall
value: [0.8 0.9 0.8 0.85 0.71428571 0.80952381
0.76190476 0.76190476 0.85 0.8 ]
mean value: 0.8047619047619048
key: train_recall
value: [0.99456522 1. 0.99456522 0.97282609 0.95081967 0.9726776
0.9726776 0.9726776 0.98369565 0.9673913 ]
mean value: 0.9781895937277263
key: test_roc_auc
value: [0.86428571 0.91428571 0.88214286 0.89821429 0.82077922 0.85930736
0.83549784 0.85367965 0.89772727 0.88181818]
mean value: 0.8707738095238096
key: train_roc_auc
value: [0.99627859 0.99799197 0.99627859 0.98641304 0.97440783 0.9863388
0.9863388 0.98533679 0.99084582 0.98369565]
mean value: 0.9883925892357555
key: test_jcc
value: [0.66666667 0.75 0.72727273 0.73913043 0.6 0.65384615
0.61538462 0.66666667 0.73913043 0.72727273]
mean value: 0.6885370426674774
key: train_jcc
value: [0.98918919 0.98924731 0.98918919 0.97282609 0.94565217 0.9726776
0.9726776 0.9673913 0.97837838 0.9673913 ]
mean value: 0.9744620129406761
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.21727848 0.22791338 0.24239659 0.35803866 0.32010889 0.32323623
0.31458378 0.28612709 0.32833815 0.32110643]
mean value: 0.29391276836395264
key: score_time
value: [0.03302312 0.03829074 0.03578234 0.03027177 0.03229713 0.03090048
0.01834702 0.01790905 0.03131437 0.03010559]
mean value: 0.029824161529541017
key: test_mcc
value: [0.3790877 0.44270548 0.38615095 0.33584695 0.37798833 0.36333066
0.31195458 0.36333066 0.47535069 0.37787109]
mean value: 0.3813617084212999
key: train_mcc
value: [0.9256841 0.92196808 0.91453396 0.91453396 0.91048427 0.91794953
0.91390497 0.92914132 0.91829032 0.92943378]
mean value: 0.9195924280035491
key: test_accuracy
value: [0.78947368 0.80263158 0.78947368 0.77631579 0.77631579 0.77631579
0.76315789 0.77631579 0.81333333 0.78666667]
mean value: 0.785
key: train_accuracy
value: [0.97067449 0.96920821 0.96627566 0.96627566 0.96480938 0.96774194
0.96627566 0.97214076 0.96778917 0.97218155]
mean value: 0.9683372476953925
key: test_fscore
value: [0.38461538 0.54545455 0.46666667 0.4137931 0.48484848 0.4137931
0.35714286 0.4137931 0.5 0.38461538]
mean value: 0.4364722633688151
key: train_fscore
value: [0.94252874 0.93948127 0.93333333 0.93333333 0.92982456 0.93604651
0.93333333 0.94524496 0.93641618 0.94555874]
mean value: 0.9375100957673573
key: test_precision
value: [0.83333333 0.69230769 0.7 0.66666667 0.66666667 0.75
0.71428571 0.75 0.875 0.83333333]
mean value: 0.7481593406593406
key: train_precision
value: [1. 1. 1. 1. 1. 1.
0.99382716 1. 1. 1. ]
mean value: 0.9993827160493827
key: test_recall
value: [0.25 0.45 0.35 0.3 0.38095238 0.28571429
0.23809524 0.28571429 0.35 0.25 ]
mean value: 0.314047619047619
key: train_recall
value: [0.89130435 0.88586957 0.875 0.875 0.86885246 0.87978142
0.87978142 0.89617486 0.88043478 0.89673913]
mean value: 0.8828937990021383
key: test_roc_auc
value: [0.61607143 0.68928571 0.64821429 0.62321429 0.65411255 0.62467532
0.6008658 0.62467532 0.66590909 0.61590909]
mean value: 0.63629329004329
key: train_roc_auc
value: [0.94565217 0.94293478 0.9375 0.9375 0.93442623 0.93989071
0.93888871 0.94808743 0.94021739 0.94836957]
mean value: 0.9413466991002676
key: test_jcc
value: [0.23809524 0.375 0.30434783 0.26086957 0.32 0.26086957
0.2173913 0.26086957 0.33333333 0.23809524]
mean value: 0.2808871635610766
key: train_jcc
value: [0.89130435 0.88586957 0.875 0.875 0.86885246 0.87978142
0.875 0.89617486 0.88043478 0.89673913]
mean value: 0.8824156569256355
MCC on Blind test: 0.35
Accuracy on Blind test: 0.78
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.01005697 0.99557376 1.00276566 0.99281025 1.00908518 0.99489617
1.00263834 1.00866675 0.99828649 0.99859738]
mean value: 1.0013376951217652
key: score_time
value: [0.01030517 0.009691 0.01017618 0.00980783 0.07066965 0.01032805
0.00980544 0.00971413 0.00966191 0.00945163]
mean value: 0.015961098670959472
key: test_mcc
value: [0.71205323 0.77709656 0.87039519 0.83350524 0.63304195 0.71964027
0.70856367 0.69986305 0.79545455 0.8035183 ]
mean value: 0.7553132004938042
key: train_mcc
value: [1. 0.99627913 1. 0.9888617 1. 1.
1. 1. 1. 1. ]
mean value: 0.9985140825524362
key: test_accuracy
value: [0.88157895 0.90789474 0.94736842 0.93421053 0.85526316 0.88157895
0.88157895 0.88157895 0.92 0.92 ]
mean value: 0.9011052631578947
key: train_accuracy
value: [1. 0.99853372 1. 0.99560117 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994134897360704
key: test_fscore
value: [0.79069767 0.8372093 0.9047619 0.87804878 0.73170732 0.8
0.79069767 0.7804878 0.85 0.85714286]
mean value: 0.8220753315506577
key: train_fscore
value: [1. 0.9972752 1. 0.99186992 1. 1.
1. 1. 1. 1. ]
mean value: 0.998914512305886
key: test_precision
value: [0.73913043 0.7826087 0.86363636 0.85714286 0.75 0.75
0.77272727 0.8 0.85 0.81818182]
mean value: 0.7983427442123094
key: train_precision
value: [1. 1. 1. 0.98918919 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989189189189189
key: test_recall
value: [0.85 0.9 0.95 0.9 0.71428571 0.85714286
0.80952381 0.76190476 0.85 0.9 ]
mean value: 0.8492857142857143
key: train_recall
value: [1. 0.99456522 1. 0.99456522 1. 1.
1. 1. 1. 1. ]
mean value: 0.9989130434782608
key: test_roc_auc
value: [0.87142857 0.90535714 0.94821429 0.92321429 0.81168831 0.87402597
0.85930736 0.84458874 0.89772727 0.91363636]
mean value: 0.8849188311688312
key: train_roc_auc
value: [1. 0.99728261 1. 0.99527458 1. 1.
1. 1. 1. 1. ]
mean value: 0.999255718526279
key: test_jcc
value: [0.65384615 0.72 0.82608696 0.7826087 0.57692308 0.66666667
0.65384615 0.64 0.73913043 0.75 ]
mean value: 0.7009108138238573
key: train_jcc
value: [1. 0.99456522 1. 0.98387097 1. 1.
1. 1. 1. 1. ]
mean value: 0.9978436185133239
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03526473 0.0357461 0.03524756 0.04501867 0.03488183 0.0351634
0.04225397 0.03609633 0.03565288 0.06816936]
mean value: 0.040349483489990234
key: score_time
value: [0.01241183 0.01264811 0.01375818 0.01282477 0.01378298 0.01384163
0.01376128 0.02365279 0.01395798 0.0260396 ]
mean value: 0.01566791534423828
key: test_mcc
value: [ 0.19855331 0.08872443 0.15357143 0.20690038 0.18687064 0.21349671
0.13659979 0.2289763 -0.05397347 0.26827168]
mean value: 0.16279911918256598
key: train_mcc
value: [0.36535467 0.29778297 0.3516267 0.28795303 0.28968809 0.39700229
0.30777947 0.37907736 0.31662698 0.30000249]
mean value: 0.3292894060024899
key: test_accuracy
value: [0.48684211 0.39473684 0.44736842 0.42105263 0.40789474 0.55263158
0.43421053 0.56578947 0.34666667 0.48 ]
mean value: 0.453719298245614
key: train_accuracy
value: [0.53519062 0.46334311 0.52052786 0.45307918 0.45454545 0.56891496
0.47360704 0.54985337 0.48316252 0.46559297]
mean value: 0.4967817074060875
key: test_fscore
value: [0.46575342 0.425 0.44736842 0.46341463 0.47058824 0.48484848
0.4556962 0.49230769 0.37974684 0.49350649]
mean value: 0.4578230423787979
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[0.53722628 0.5013624 0.5294964 0.49662618 0.49593496 0.55454545
0.50482759 0.54383358 0.51040222 0.50204638]
mean value: 0.517630144384987
key: test_precision
value: [0.32075472 0.28333333 0.30357143 0.30645161 0.3125 0.35555556
0.31034483 0.36363636 0.25423729 0.33333333]
mean value: 0.31437184600361723
key: train_precision
value: [0.36726547 0.33454545 0.36007828 0.33034111 0.32972973 0.3836478
0.33763838 0.37346939 0.34264432 0.33515483]
mean value: 0.3494514754466544
key: test_recall
value: [0.85 0.85 0.85 0.95 0.95238095 0.76190476
0.85714286 0.76190476 0.75 0.95 ]
mean value: 0.8533333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.60357143 0.54107143 0.57678571 0.59107143 0.57619048 0.61731602
0.56493506 0.62640693 0.475 0.62954545]
mean value: 0.580189393939394
key: train_roc_auc
value: [0.68172691 0.63253012 0.67168675 0.62550201 0.62725451 0.70541082
0.64028056 0.69238477 0.64629259 0.63426854]
mean value: 0.6557337566699665
key: test_jcc
value: [0.30357143 0.26984127 0.28813559 0.3015873 0.30769231 0.32
0.29508197 0.32653061 0.234375 0.32758621]
mean value: 0.2974401687267211
key: train_jcc
value: [0.36726547 0.33454545 0.36007828 0.33034111 0.32972973 0.3836478
0.33763838 0.37346939 0.34264432 0.33515483]
mean value: 0.3494514754466544
MCC on Blind test: 0.14
Accuracy on Blind test: 0.42
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.0312705 0.03619003 0.03808308 0.03740096 0.01732135 0.01709747
0.01744127 0.02994561 0.04224467 0.03739405]
mean value: 0.030438899993896484
key: score_time
value: [0.02949142 0.02922988 0.02837372 0.01918077 0.01225829 0.0122056
0.01218557 0.01889825 0.02776265 0.01940441]
mean value: 0.020899057388305664
key: test_mcc
value: [0.58196658 0.63304195 0.56390496 0.67273572 0.64368314 0.28501393
0.44503488 0.62471635 0.72727273 0.61930936]
mean value: 0.5796679617856088
key: train_mcc
value: [0.74528173 0.74306574 0.72650934 0.73282742 0.74496772 0.74699975
0.74021989 0.73268489 0.71290303 0.73493383]
mean value: 0.7360393351702404
key: test_accuracy
value: [0.82894737 0.85526316 0.84210526 0.86842105 0.85526316 0.73684211
0.78947368 0.85526316 0.89333333 0.85333333]
mean value: 0.8378245614035088
key: train_accuracy
value: [0.90029326 0.90029326 0.89442815 0.89589443 0.90175953 0.90175953
0.89882698 0.89589443 0.88872621 0.89751098]
mean value: 0.8975386748989923
key: test_fscore
value: [0.69767442 0.73170732 0.64705882 0.76190476 0.74418605 0.44444444
0.57894737 0.71794872 0.8 0.71794872]
mean value: 0.6841820616386557
key: train_fscore
value: [0.81318681 0.81005587 0.79661017 0.8033241 0.8101983 0.81337047
0.80886427 0.8033241 0.7877095 0.80337079]
mean value: 0.8050014371518536
key: test_precision
value: [0.65217391 0.71428571 0.78571429 0.72727273 0.72727273 0.53333333
0.64705882 0.77777778 0.8 0.73684211]
mean value: 0.7101731407492614
key: train_precision
value: [0.82222222 0.83333333 0.82941176 0.81920904 0.84117647 0.82954545
0.82022472 0.81460674 0.81034483 0.83139535]
mean value: 0.8251469922040724
key: test_recall
value: [0.75 0.75 0.55 0.8 0.76190476 0.38095238
0.52380952 0.66666667 0.8 0.7 ]
mean value: 0.6683333333333333
key: train_recall
value: [0.80434783 0.78804348 0.76630435 0.78804348 0.78142077 0.79781421
0.79781421 0.79234973 0.76630435 0.77717391]
mean value: 0.7859616298408173
key: test_roc_auc
value: [0.80357143 0.82142857 0.74821429 0.84642857 0.82640693 0.62683983
0.70735931 0.7969697 0.86363636 0.80454545]
mean value: 0.7845400432900433
key: train_roc_auc
value: [0.8700454 0.86490527 0.85403571 0.86189323 0.86365627 0.86884698
0.86684298 0.86310873 0.85008604 0.85952884]
mean value: 0.8622949451889779
key: test_jcc
value: [0.53571429 0.57692308 0.47826087 0.61538462 0.59259259 0.28571429
0.40740741 0.56 0.66666667 0.56 ]
mean value: 0.5278663799968147
key: train_jcc
value: [0.68518519 0.68075117 0.66197183 0.6712963 0.68095238 0.68544601
0.67906977 0.6712963 0.64976959 0.6713615 ]
mean value: 0.67371000278574
MCC on Blind test: 0.59
Accuracy on Blind test: 0.84
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.25372815 0.26200867 0.38737082 0.51150823 0.44220877 0.36624289
0.32012987 0.29179287 0.28646541 0.35943341]
mean value: 0.3480889081954956
key: score_time
value: [0.01247811 0.01930356 0.02239323 0.02633381 0.02799511 0.01895785
0.03551054 0.01894855 0.0190196 0.01937056]
mean value: 0.022031092643737794
key: test_mcc
value: [0.58196658 0.63304195 0.56390496 0.67273572 0.64368314 0.28501393
0.44503488 0.62471635 0.72727273 0.61930936]
mean value: 0.5796679617856088
key: train_mcc
value: [0.74528173 0.74306574 0.72650934 0.73282742 0.74496772 0.74699975
0.74021989 0.73268489 0.71290303 0.73493383]
mean value: 0.7360393351702404
key: test_accuracy
value: [0.82894737 0.85526316 0.84210526 0.86842105 0.85526316 0.73684211
0.78947368 0.85526316 0.89333333 0.85333333]
mean value: 0.8378245614035088
key: train_accuracy
value: [0.90029326 0.90029326 0.89442815 0.89589443 0.90175953 0.90175953
0.89882698 0.89589443 0.88872621 0.89751098]
mean value: 0.8975386748989923
key: test_fscore
value: [0.69767442 0.73170732 0.64705882 0.76190476 0.74418605 0.44444444
0.57894737 0.71794872 0.8 0.71794872]
mean value: 0.6841820616386557
key: train_fscore
value: [0.81318681 0.81005587 0.79661017 0.8033241 0.8101983 0.81337047
0.80886427 0.8033241 0.7877095 0.80337079]
mean value: 0.8050014371518536
key: test_precision
value: [0.65217391 0.71428571 0.78571429 0.72727273 0.72727273 0.53333333
0.64705882 0.77777778 0.8 0.73684211]
mean value: 0.7101731407492614
key: train_precision
value: [0.82222222 0.83333333 0.82941176 0.81920904 0.84117647 0.82954545
0.82022472 0.81460674 0.81034483 0.83139535]
mean value: 0.8251469922040724
key: test_recall
value: [0.75 0.75 0.55 0.8 0.76190476 0.38095238
0.52380952 0.66666667 0.8 0.7 ]
mean value: 0.6683333333333333
key: train_recall
value: [0.80434783 0.78804348 0.76630435 0.78804348 0.78142077 0.79781421
0.79781421 0.79234973 0.76630435 0.77717391]
mean value: 0.7859616298408173
key: test_roc_auc
value: [0.80357143 0.82142857 0.74821429 0.84642857 0.82640693 0.62683983
0.70735931 0.7969697 0.86363636 0.80454545]
mean value: 0.7845400432900433
key: train_roc_auc
value: [0.8700454 0.86490527 0.85403571 0.86189323 0.86365627 0.86884698
0.86684298 0.86310873 0.85008604 0.85952884]
mean value: 0.8622949451889779
key: test_jcc
value: [0.53571429 0.57692308 0.47826087 0.61538462 0.59259259 0.28571429
0.40740741 0.56 0.66666667 0.56 ]
mean value: 0.5278663799968147
key: train_jcc
value: [0.68518519 0.68075117 0.66197183 0.6712963 0.68095238 0.68544601
0.67906977 0.6712963 0.64976959 0.6713615 ]
mean value: 0.67371000278574
MCC on Blind test: 0.59
Accuracy on Blind test: 0.84
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04412484 0.04355478 0.05054665 0.05056119 0.04378176 0.04390931
0.04411793 0.05126095 0.05250835 0.05323243]
mean value: 0.04775981903076172
key: score_time
value: [0.01327801 0.01320815 0.01321435 0.01340771 0.02071118 0.01331592
0.01335812 0.01337099 0.01346803 0.01352692]
mean value: 0.014085936546325683
key: test_mcc
value: [0.65875884 0.76590909 0.78434561 0.79230071 0.82447186 0.75530907
0.76675488 0.75530907 0.8376106 0.83984125]
mean value: 0.7780610974433546
key: train_mcc
value: [0.82698297 0.82612027 0.83101258 0.81404424 0.82455243 0.82777216
0.82298562 0.81837105 0.80646893 0.82284436]
mean value: 0.8221154595374749
key: test_accuracy
value: [0.82882883 0.88288288 0.89189189 0.89189189 0.90990991 0.87387387
0.88288288 0.87387387 0.91818182 0.91818182]
mean value: 0.8872399672399672
key: train_accuracy
value: [0.91173521 0.91173521 0.91374122 0.90571715 0.9107322 0.91273821
0.90972919 0.90772317 0.90180361 0.90981964]
mean value: 0.9095474801156977
key: test_fscore
value: [0.83185841 0.88288288 0.89285714 0.89830508 0.91525424 0.88333333
0.88695652 0.88333333 0.92035398 0.92173913]
mean value: 0.8916874055995034
key: train_fscore
value: [0.91570881 0.91522158 0.91762452 0.90944123 0.91434071 0.91577928
0.91362764 0.91136802 0.90576923 0.91362764]
mean value: 0.9132508666793058
key: test_precision
value: [0.81034483 0.875 0.87719298 0.84126984 0.87096774 0.828125
0.86440678 0.828125 0.89655172 0.88333333]
mean value: 0.8575317230379954
key: train_precision
value: [0.87706422 0.8812616 0.87889908 0.87569573 0.8780037 0.88411215
0.875 0.87592593 0.87060998 0.87661142]
mean value: 0.8773183803018094
key: test_recall
value: [0.85454545 0.89090909 0.90909091 0.96363636 0.96428571 0.94642857
0.91071429 0.94642857 0.94545455 0.96363636]
mean value: 0.929512987012987
key: train_recall
value: [0.95791583 0.95190381 0.95991984 0.94589178 0.95381526 0.9497992
0.95582329 0.9497992 0.94388778 0.95390782]
mean value: 0.952266380149858
key: test_roc_auc
value: [0.82905844 0.88295455 0.89204545 0.89253247 0.90941558 0.87321429
0.88262987 0.87321429 0.91818182 0.91818182]
mean value: 0.8871428571428571
key: train_roc_auc
value: [0.91168884 0.91169488 0.91369486 0.90567682 0.91077537 0.91277535
0.90977537 0.90776533 0.90180361 0.90981964]
mean value: 0.9095470056579021
key: test_jcc
value: [0.71212121 0.79032258 0.80645161 0.81538462 0.84375 0.79104478
0.796875 0.79104478 0.85245902 0.85483871]
mean value: 0.8054292299363882
key: train_jcc
value: [0.84452297 0.84369449 0.84778761 0.83392226 0.84219858 0.84464286
0.8409894 0.83716814 0.82776801 0.8409894 ]
mean value: 0.8403683727027139
MCC on Blind test: 0.64
Accuracy on Blind test: 0.84
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.17848635 1.02553439 1.21671796 1.04096627 1.16208816 1.04139638
1.0986526 1.04701781 1.10708237 1.07402682]
mean value: 1.0991969108581543
key: score_time
value: [0.01554346 0.01578808 0.01572728 0.01564598 0.01846838 0.013484
0.01977873 0.01353335 0.01727462 0.01542902]
mean value: 0.01606729030609131
key: test_mcc
value: [0.71168831 0.78434561 0.856354 0.80845318 0.76868784 0.78818464
0.80286425 0.77570306 0.78181818 0.80119274]
mean value: 0.7879291811034943
key: train_mcc
value: [0.84975048 0.83532183 0.86829651 0.83177022 0.88050193 0.8760821
0.84009966 0.83344638 0.8317077 0.88033245]
mean value: 0.8527309266989362
key: test_accuracy
value: [0.85585586 0.89189189 0.92792793 0.9009009 0.88288288 0.89189189
0.9009009 0.88288288 0.89090909 0.9 ]
mean value: 0.8926044226044226
key: train_accuracy
value: [0.92377131 0.91675025 0.9338014 0.91474423 0.93981946 0.93781344
0.91875627 0.91574724 0.91482966 0.93987976]
mean value: 0.9255913029670173
key: test_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.85454545 0.89285714 0.92592593 0.90598291 0.88888889 0.89830508
0.90434783 0.89256198 0.89090909 0.90265487]
mean value: 0.895697917066984
key: train_fscore
value: [0.92649903 0.91949564 0.93516699 0.9178744 0.9410609 0.93873518
0.92173913 0.9184466 0.91771539 0.94094488]
mean value: 0.9277678146355568
key: test_precision
value: [0.85454545 0.87719298 0.94339623 0.85483871 0.85245902 0.85483871
0.88135593 0.83076923 0.89090909 0.87931034]
mean value: 0.8719615697874268
key: train_precision
value: [0.8953271 0.89097744 0.91714836 0.88619403 0.92115385 0.92412451
0.88826816 0.88909774 0.88764045 0.9245648 ]
mean value: 0.9024496445400005
key: test_recall
value: [0.85454545 0.90909091 0.90909091 0.96363636 0.92857143 0.94642857
0.92857143 0.96428571 0.89090909 0.92727273]
mean value: 0.9222402597402597
key: train_recall
value: [0.95991984 0.9498998 0.95390782 0.95190381 0.96184739 0.95381526
0.95783133 0.9497992 0.9498998 0.95791583]
mean value: 0.9546740066478338
key: test_roc_auc
value: [0.85584416 0.89204545 0.92775974 0.90146104 0.88246753 0.8913961
0.90064935 0.88214286 0.89090909 0.9 ]
mean value: 0.8924675324675325
key: train_roc_auc
value: [0.92373502 0.91671697 0.93378122 0.91470692 0.93984153 0.93782947
0.91879542 0.91578136 0.91482966 0.93987976]
mean value: 0.9255897336842359
key: test_jcc
value: [0.74603175 0.80645161 0.86206897 0.828125 0.8 0.81538462
0.82539683 0.80597015 0.80327869 0.82258065]
mean value: 0.8115288248173266
key: train_jcc
value: [0.86306306 0.85098743 0.87822878 0.84821429 0.88868275 0.88454376
0.85483871 0.8491921 0.84794275 0.88847584]
mean value: 0.8654169472771298
MCC on Blind test: 0.64
Accuracy on Blind test: 0.85
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0201087 0.01285744 0.01268411 0.01318502 0.0148437 0.01225019
0.01224089 0.01258802 0.01277614 0.01228714]
mean value: 0.013582134246826172
key: score_time
value: [0.01278663 0.00968266 0.00952196 0.00947142 0.00946116 0.00926948
0.00921249 0.00973344 0.00934577 0.00983071]
mean value: 0.009831571578979492
key: test_mcc
value: [0.57297043 0.50450356 0.49545455 0.62268473 0.67564935 0.64303575
0.46096379 0.55139323 0.71097366 0.67451348]
mean value: 0.5912142519514377
key: train_mcc
value: [0.6049526 0.61319134 0.60481447 0.59085458 0.59881849 0.60886065
0.62097028 0.61691639 0.59338778 0.57915832]
mean value: 0.6031924888496714
key: test_accuracy
value: [0.78378378 0.74774775 0.74774775 0.81081081 0.83783784 0.81981982
0.72972973 0.77477477 0.85454545 0.83636364]
mean value: 0.7943161343161343
key: train_accuracy
value: [0.80240722 0.80541625 0.80240722 0.79538616 0.79939819 0.80441324
0.81043129 0.80842528 0.79659319 0.78957916]
mean value: 0.801445719925307
key: test_fscore
value: [0.76470588 0.76666667 0.74545455 0.81415929 0.83928571 0.83050847
0.72222222 0.78632479 0.85964912 0.83018868]
mean value: 0.7959165385970846
key: train_fscore
value: [0.80475719 0.81381958 0.8028028 0.79393939 0.8 0.80519481
0.8119403 0.80957129 0.7992087 0.78957916]
mean value: 0.8030813212223025
key: test_precision
value: [0.82978723 0.70769231 0.74545455 0.79310345 0.83928571 0.79032258
0.75 0.75409836 0.83050847 0.8627451 ]
mean value: 0.7902997763667369
key: train_precision
value: [0.79607843 0.78084715 0.802 0.80040733 0.79681275 0.80119284
0.80473373 0.8039604 0.7890625 0.78957916]
mean value: 0.7964674282949357
key: test_recall
value: [0.70909091 0.83636364 0.74545455 0.83636364 0.83928571 0.875
0.69642857 0.82142857 0.89090909 0.8 ]
mean value: 0.8050324675324675
key: train_recall
value: [0.81362725 0.8496994 0.80360721 0.78757515 0.80321285 0.80923695
0.81927711 0.81526104 0.80961924 0.78957916]
mean value: 0.8100695366636889
key: test_roc_auc
value: [0.78311688 0.74853896 0.74772727 0.81103896 0.83782468 0.81931818
0.73003247 0.77435065 0.85454545 0.83636364]
mean value: 0.7942857142857143
key: train_roc_auc
value: [0.80239596 0.80537179 0.80240602 0.795394 0.79940202 0.80441807
0.81044016 0.80843213 0.79659319 0.78957916]
mean value: 0.8014432479416664
key: test_jcc
value: [0.61904762 0.62162162 0.5942029 0.68656716 0.72307692 0.71014493
0.56521739 0.64788732 0.75384615 0.70967742]
mean value: 0.6631289442461227
key: train_jcc
value: [0.67330017 0.68608414 0.67056856 0.65829146 0.66666667 0.67391304
0.68341709 0.680067 0.66556837 0.65231788]
mean value: 0.6710194374461457
MCC on Blind test: 0.37
Accuracy on Blind test: 0.74
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01392293 0.01701427 0.01723433 0.01750612 0.01728272 0.01705217
0.01722479 0.01711893 0.01728821 0.01710558]
mean value: 0.01687500476837158
key: score_time
value: [0.01087117 0.01245928 0.01249266 0.01245236 0.01248884 0.01242113
0.01247978 0.01248288 0.01243448 0.01248121]
mean value: 0.01230638027191162
key: test_mcc
value: [0.58827674 0.64014294 0.56929191 0.7137294 0.62939373 0.69891539
0.66004053 0.64303575 0.73323558 0.80119274]
mean value: 0.667725470808785
key: train_mcc
value: [0.70693545 0.72572666 0.68759086 0.71792245 0.68682818 0.71394469
0.68575288 0.7136947 0.7113771 0.71018107]
mean value: 0.7059954028409837
key: test_accuracy
value: [0.79279279 0.81981982 0.78378378 0.85585586 0.81081081 0.84684685
0.82882883 0.81981982 0.86363636 0.9 ]
mean value: 0.8322194922194922
key: train_accuracy
value: [0.85155466 0.86158475 0.84252758 0.85757272 0.84152457 0.8555667
0.84152457 0.8555667 0.85470942 0.85370741]
mean value: 0.8515839100467736
key: test_fscore
value: [0.8 0.82142857 0.78947368 0.85964912 0.82644628 0.85714286
0.83760684 0.83050847 0.87179487 0.90265487]
mean value: 0.8396705567815326
key: train_fscore
value: [0.85904762 0.86730769 0.84918348 0.86372361 0.84923664 0.86153846
0.84807692 0.86127168 0.85990338 0.85988484]
mean value: 0.8579174317858217
key: test_precision
value: [0.76666667 0.80701754 0.76271186 0.83050847 0.76923077 0.80952381
0.80327869 0.79032258 0.82258065 0.87931034]
mean value: 0.8041151387422574
key: train_precision
value: [0.8185118 0.8336414 0.81549815 0.82872928 0.80909091 0.82656827
0.81365314 0.82777778 0.83022388 0.82504604]
mean value: 0.8228740648484011
key: test_recall
value: [0.83636364 0.83636364 0.81818182 0.89090909 0.89285714 0.91071429
0.875 0.875 0.92727273 0.92727273]
mean value: 0.8789935064935065
key: train_recall
value: [0.90380762 0.90380762 0.88577154 0.90180361 0.8935743 0.89959839
0.88554217 0.89759036 0.89178357 0.89779559]
mean value: 0.896107475996169
key: test_roc_auc
value: [0.79318182 0.81996753 0.78409091 0.85616883 0.81006494 0.84626623
0.82840909 0.81931818 0.86363636 0.9 ]
mean value: 0.8321103896103896
key: train_roc_auc
value: [0.8515022 0.86154236 0.84248417 0.85752831 0.84157673 0.85561082
0.84156868 0.85560881 0.85470942 0.85370741]
mean value: 0.8515838906729121
key: test_jcc
value: [0.66666667 0.6969697 0.65217391 0.75384615 0.70422535 0.75
0.72058824 0.71014493 0.77272727 0.82258065]
mean value: 0.7249922863357584
key: train_jcc
value: [0.75292154 0.76570458 0.73789649 0.76013514 0.73797678 0.75675676
0.73622705 0.75634518 0.75423729 0.75420875]
mean value: 0.7512409553820072
MCC on Blind test: 0.52
Accuracy on Blind test: 0.8
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01607823 0.01308084 0.01161551 0.01328683 0.01202369 0.01301265
0.01235914 0.01249504 0.01264787 0.01307225]
mean value: 0.012967205047607422
key: score_time
value: [0.03294516 0.01623058 0.01591921 0.01628637 0.01585436 0.01614761
0.01592636 0.01668 0.01613188 0.01644087]
mean value: 0.017856240272521973
key: test_mcc
value: [0.62443328 0.62701706 0.67994289 0.79230071 0.62617314 0.6914393
0.63352948 0.76868784 0.76477489 0.67451348]
mean value: 0.688281206745035
key: train_mcc
value: [0.80337288 0.78897563 0.78668689 0.80851256 0.80226699 0.79915817
0.79748273 0.79748273 0.7955465 0.7970329 ]
mean value: 0.7976517990393334
key: test_accuracy
value: [0.81081081 0.81081081 0.83783784 0.89189189 0.81081081 0.83783784
0.81081081 0.88288288 0.88181818 0.83636364]
mean value: 0.8411875511875512
key: train_accuracy
value: [0.90070211 0.89368104 0.89167503 0.90371113 0.8996991 0.89869609
0.89769308 0.89769308 0.89679359 0.89779559]
mean value: 0.8978139830312581
key: test_fscore
value: [0.8173913 0.82051282 0.84482759 0.89830508 0.82352941 0.85483871
0.82926829 0.88888889 0.88495575 0.83018868]
mean value: 0.8492706530284919
key: train_fscore
value: [0.90416263 0.89708738 0.89655172 0.90625 0.90366089 0.90184645
0.90116279 0.90116279 0.90029042 0.90077821]
mean value: 0.9012953282848261
key: test_precision
value: [0.78333333 0.77419355 0.80327869 0.84126984 0.77777778 0.77941176
0.76119403 0.85245902 0.86206897 0.8627451 ]
mean value: 0.8097732063799168
key: train_precision
value: [0.87453184 0.8700565 0.8587156 0.88380952 0.86851852 0.87382298
0.87078652 0.87078652 0.87078652 0.87523629]
mean value: 0.8717050792015171
key: test_recall
value: [0.85454545 0.87272727 0.89090909 0.96363636 0.875 0.94642857
0.91071429 0.92857143 0.90909091 0.8 ]
mean value: 0.8951623376623377
key: train_recall
value: [0.93587174 0.9258517 0.93787575 0.92985972 0.94176707 0.93172691
0.93373494 0.93373494 0.93186373 0.92785571]
mean value: 0.9330142212135114
key: test_roc_auc
value: [0.8112013 0.81136364 0.83831169 0.89253247 0.81022727 0.83685065
0.8099026 0.88246753 0.88181818 0.83636364]
mean value: 0.8411038961038961
key: train_roc_auc
value: [0.9006668 0.89364874 0.89162864 0.90368488 0.89974125 0.89872919
0.89772919 0.89772919 0.89679359 0.89779559]
mean value: 0.8978147057166542
key: test_jcc
value: [0.69117647 0.69565217 0.73134328 0.81538462 0.7 0.74647887
0.70833333 0.8 0.79365079 0.70967742]
mean value: 0.7391696963046386
key: train_jcc
value: [0.82508834 0.81338028 0.8125 0.82857143 0.82425308 0.82123894
0.82010582 0.82010582 0.81866197 0.81946903]
mean value: 0.8203374701699758
MCC on Blind test: 0.46
Accuracy on Blind test: 0.77
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.06150103 0.05591226 0.06038022 0.05534601 0.05132461 0.05020452
0.05948067 0.06166291 0.05109692 0.06067967]
mean value: 0.056758880615234375
key: score_time
value: [0.01889443 0.019135 0.01991487 0.020509 0.01846647 0.01887274
0.01900005 0.02215457 0.01905775 0.01903105]
mean value: 0.01950359344482422
key: test_mcc
value: [0.64014294 0.73290291 0.73290291 0.75592959 0.84439989 0.75979502
0.74951538 0.72309474 0.79022225 0.87402845]
mean value: 0.7602934058473255
key: train_mcc
value: [0.81034285 0.81117338 0.80484205 0.80964527 0.80931593 0.81371644
0.81962105 0.80783682 0.80397646 0.79958404]
mean value: 0.8090054298398948
key: test_accuracy
value: [0.81981982 0.86486486 0.86486486 0.87387387 0.91891892 0.87387387
0.87387387 0.85585586 0.89090909 0.93636364]
mean value: 0.8773218673218673
key: train_accuracy
value: [0.90270812 0.90371113 0.8996991 0.90270812 0.90170512 0.90471414
0.90672016 0.90170512 0.8997996 0.89679359]
mean value: 0.9020264199411863
key: test_fscore
value: [0.82142857 0.86956522 0.86956522 0.88135593 0.92436975 0.8852459
0.87931034 0.86885246 0.89830508 0.9380531 ]
mean value: 0.8836051573887949
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
train_fscore
value: [0.90788224 0.9082218 0.90530303 0.90753098 0.90719697 0.90926457
0.91201514 0.90648855 0.9047619 0.90273843]
mean value: 0.9071403609895646
key: test_precision
value: [0.80701754 0.83333333 0.83333333 0.82539683 0.87301587 0.81818182
0.85 0.8030303 0.84126984 0.9137931 ]
mean value: 0.8398371974869252
key: train_precision
value: [0.86281588 0.86837294 0.85816876 0.86545455 0.85842294 0.86703097
0.86225403 0.86363636 0.86206897 0.85357143]
mean value: 0.8621796821708623
key: test_recall
value: [0.83636364 0.90909091 0.90909091 0.94545455 0.98214286 0.96428571
0.91071429 0.94642857 0.96363636 0.96363636]
mean value: 0.9330844155844156
key: train_recall
value: [0.95791583 0.95190381 0.95791583 0.95390782 0.96184739 0.95582329
0.96787149 0.95381526 0.95190381 0.95791583]
mean value: 0.9570820355570578
key: test_roc_auc
value: [0.81996753 0.86525974 0.86525974 0.87451299 0.91834416 0.87305195
0.87353896 0.85503247 0.89090909 0.93636364]
mean value: 0.8772240259740259
key: train_roc_auc
value: [0.90265269 0.90366275 0.89964065 0.90265672 0.90176538 0.90476535
0.90678143 0.90175733 0.8997996 0.89679359]
mean value: 0.9020275490740517
key: test_jcc
value: [0.6969697 0.76923077 0.76923077 0.78787879 0.859375 0.79411765
0.78461538 0.76811594 0.81538462 0.88333333]
mean value: 0.7928251945731166
key: train_jcc
value: [0.83130435 0.83187391 0.82698962 0.83071553 0.83015598 0.83362522
0.83826087 0.82897033 0.82608696 0.82271945]
mean value: 0.8300702209936055
MCC on Blind test: 0.61
Accuracy on Blind test: 0.83
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.62372518 2.43982482 4.13287544 4.33861852 4.25920129 2.64669728
3.125072 3.28911853 2.51461816 2.4663496 ]
mean value: 3.1836100816726685
key: score_time
value: [0.01274467 0.01836348 0.01655102 0.01567531 0.01266837 0.01298928
0.01277828 0.01279283 0.01288271 0.01289606]
mean value: 0.014034199714660644
key: test_mcc
value: [0.69373177 0.84137254 0.80286425 0.805216 0.8049036 0.82447186
0.81980519 0.87733514 0.85681441 0.83650191]
mean value: 0.8163016684437298
key: train_mcc
value: [0.95991082 0.92035652 0.99200779 0.98595987 0.96427298 0.9526244
0.97000492 0.96613216 0.94644118 0.94994749]
mean value: 0.960765812657737
key: test_accuracy
value: [0.84684685 0.91891892 0.9009009 0.9009009 0.9009009 0.90990991
0.90990991 0.93693694 0.92727273 0.91818182]
mean value: 0.907067977067977
key: train_accuracy
value: [0.97993982 0.95987964 0.99598796 0.99297894 0.98194584 0.97592778
0.98495486 0.98294885 0.97294589 0.9749499 ]
mean value: 0.9802459482656386
key: test_fscore
value: [0.8440367 0.92173913 0.89719626 0.90434783 0.90598291 0.91525424
0.91071429 0.94017094 0.92982456 0.91743119]
mean value: 0.9086698038672015
key: train_fscore
value: [0.97987928 0.96062992 0.99600798 0.99297894 0.98217822 0.97637795
0.98483316 0.98274112 0.97339901 0.97507478]
mean value: 0.9804100360349339
key: test_precision
value: [0.85185185 0.88333333 0.92307692 0.86666667 0.86885246 0.87096774
0.91071429 0.90163934 0.89830508 0.92592593]
mean value: 0.8901333616528921
key: train_precision
value: [0.98383838 0.94390716 0.99204771 0.9939759 0.96875 0.95752896
0.99185336 0.99383984 0.95736434 0.9702381 ]
mean value: 0.9753343747913725
key: test_recall
value: [0.83636364 0.96363636 0.87272727 0.94545455 0.94642857 0.96428571
0.91071429 0.98214286 0.96363636 0.90909091]
mean value: 0.9294480519480519
key: train_recall
value: [0.9759519 0.97795591 1. 0.99198397 0.99598394 0.99598394
0.97791165 0.97188755 0.98997996 0.97995992]
mean value: 0.9857598731599746
key: test_roc_auc
value: [0.84675325 0.91931818 0.90064935 0.9012987 0.90048701 0.90941558
0.9099026 0.93652597 0.92727273 0.91818182]
mean value: 0.9069805194805195
key: train_roc_auc
value: [0.97994382 0.95986149 0.99598394 0.99297994 0.9819599 0.97594788
0.98494781 0.98293776 0.97294589 0.9749499 ]
mean value: 0.9802458330315249
key: test_jcc
value: [0.73015873 0.85483871 0.81355932 0.82539683 0.828125 0.84375
0.83606557 0.88709677 0.86885246 0.84745763]
mean value: 0.8335301021365951
key: train_jcc
value: [0.96055227 0.92424242 0.99204771 0.98605578 0.96498054 0.95384615
0.97011952 0.96606786 0.94817658 0.95136187]
mean value: 0.961745071907173
MCC on Blind test: 0.61
Accuracy on Blind test: 0.84
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.07478142 0.07933569 0.06970549 0.07234645 0.08080792 0.06816864
0.06994104 0.09257722 0.09551525 0.06402373]
mean value: 0.07672028541564942
key: score_time
value: [0.00968766 0.00928879 0.0095427 0.00937366 0.00937152 0.00922513
0.00954008 0.00962567 0.00966001 0.0095737 ]
mean value: 0.009488892555236817
key: test_mcc
value: [0.6962563 0.86102173 0.73528651 0.80188377 0.78818464 0.87733514
0.78420577 0.80305531 0.80013226 0.78389404]
mean value: 0.7931255470199186
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84684685 0.92792793 0.86486486 0.9009009 0.89189189 0.93693694
0.89189189 0.9009009 0.9 0.89090909]
mean value: 0.8953071253071253
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.83809524 0.93103448 0.85436893 0.89908257 0.89830508 0.94017094
0.89473684 0.89908257 0.89908257 0.88679245]
mean value: 0.8940751679166867
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88 0.8852459 0.91666667 0.90740741 0.85483871 0.90163934
0.87931034 0.9245283 0.90740741 0.92156863]
mean value: 0.89786127112259
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.8 0.98181818 0.8 0.89090909 0.94642857 0.98214286
0.91071429 0.875 0.89090909 0.85454545]
mean value: 0.8932467532467532
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.84642857 0.92840909 0.86428571 0.90081169 0.8913961 0.93652597
0.89172078 0.90113636 0.9 0.89090909]
mean value: 0.8951623376623377
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.72131148 0.87096774 0.74576271 0.81666667 0.81538462 0.88709677
0.80952381 0.81666667 0.81666667 0.79661017]
mean value: 0.8096657297803226
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.7
Accuracy on Blind test: 0.88
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.1862402 0.19378781 0.18132305 0.1813972 0.18763399 0.18403673
0.18406081 0.18009973 0.1800344 0.17876792]
mean value: 0.18373818397521974
key: score_time
value: [0.0201416 0.02079272 0.02035451 0.02068305 0.0194726 0.01997471
0.02043581 0.0189383 0.01899672 0.01892877]
mean value: 0.0198718786239624
key: test_mcc
value: [0.7306455 0.78420577 0.74772727 0.78859019 0.87733514 0.856354
0.78567192 0.84111937 0.87287156 0.8376106 ]
mean value: 0.8122131325526036
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86486486 0.89189189 0.87387387 0.89189189 0.93693694 0.92792793
0.89189189 0.91891892 0.93636364 0.91818182]
mean value: 0.9052743652743653
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85981308 0.88888889 0.87272727 0.89655172 0.94017094 0.92982456
0.89655172 0.92307692 0.93693694 0.92035398]
mean value: 0.9064896037893366
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88461538 0.90566038 0.87272727 0.85245902 0.90163934 0.9137931
0.86666667 0.8852459 0.92857143 0.89655172]
mean value: 0.8907930219820532
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.83636364 0.87272727 0.87272727 0.94545455 0.98214286 0.94642857
0.92857143 0.96428571 0.94545455 0.94545455]
mean value: 0.923961038961039
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86461039 0.89172078 0.87386364 0.89237013 0.93652597 0.92775974
0.89155844 0.91850649 0.93636364 0.91818182]
mean value: 0.905146103896104
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75409836 0.8 0.77419355 0.8125 0.88709677 0.86885246
0.8125 0.85714286 0.88135593 0.85245902]
mean value: 0.8300198947992465
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.6
Accuracy on Blind test: 0.84
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01281238 0.01285768 0.01273155 0.01263881 0.01268411 0.01274562
0.01275444 0.01256347 0.01274872 0.01252794]
mean value: 0.012706470489501954
key: score_time
value: [0.0092659 0.00906849 0.00905991 0.00917387 0.0091598 0.00904346
0.00915551 0.00906682 0.00916243 0.00901937]
mean value: 0.009117555618286134
key: test_mcc
value: [0.58557976 0.57765823 0.66058982 0.62443328 0.71884134 0.58760899
0.6962563 0.53199093 0.67272727 0.5304385 ]
mean value: 0.6186124436559258
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.79279279 0.78378378 0.82882883 0.81081081 0.85585586 0.79279279
0.84684685 0.76576577 0.83636364 0.76363636]
mean value: 0.8077477477477477
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.78899083 0.8 0.83478261 0.8173913 0.86666667 0.8034188
0.85470085 0.76363636 0.83636364 0.77586207]
mean value: 0.8141813132483394
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7962963 0.73846154 0.8 0.78333333 0.8125 0.7704918
0.81967213 0.77777778 0.83636364 0.73770492]
mean value: 0.7872601434691598
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.78181818 0.87272727 0.87272727 0.85454545 0.92857143 0.83928571
0.89285714 0.75 0.83636364 0.81818182]
mean value: 0.8447077922077922
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79269481 0.78457792 0.82922078 0.8112013 0.85519481 0.79237013
0.84642857 0.76590909 0.83636364 0.76363636]
mean value: 0.8077597402597403
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.65151515 0.66666667 0.71641791 0.69117647 0.76470588 0.67142857
0.74626866 0.61764706 0.71875 0.63380282]
mean value: 0.6878379185440683
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.46
Accuracy on Blind test: 0.77
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [3.34462881 3.33502793 3.3337965 3.32841372 3.34471703 3.3171823
3.30564308 3.33460903 3.3558104 3.32796764]
mean value: 3.3327796459198
key: score_time
value: [0.09969783 0.10093069 0.09836888 0.09869719 0.0972259 0.09813404
0.09791899 0.1053617 0.10310555 0.09741282]
mean value: 0.09968535900115967
key: test_mcc
value: [0.82027988 0.80194805 0.89242811 0.87398511 0.89414155 0.84439989
0.856354 0.94730174 0.94561086 0.90924121]
mean value: 0.878569039813939
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90990991 0.9009009 0.94594595 0.93693694 0.94594595 0.91891892
0.92792793 0.97297297 0.97272727 0.95454545]
mean value: 0.9386732186732187
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90740741 0.9009009 0.94444444 0.93577982 0.94827586 0.92436975
0.92982456 0.97391304 0.97297297 0.95495495]
mean value: 0.9392843712044336
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9245283 0.89285714 0.96226415 0.94444444 0.91666667 0.87301587
0.9137931 0.94915254 0.96428571 0.94642857]
mean value: 0.9287436511349758
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.89090909 0.90909091 0.92727273 0.92727273 0.98214286 0.98214286
0.94642857 1. 0.98181818 0.96363636]
mean value: 0.9510714285714286
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90974026 0.90097403 0.94577922 0.93685065 0.94561688 0.91834416
0.92775974 0.97272727 0.97272727 0.95454545]
mean value: 0.9385064935064935
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83050847 0.81967213 0.89473684 0.87931034 0.90163934 0.859375
0.86885246 0.94915254 0.94736842 0.9137931 ]
mean value: 0.8864408662809139
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.71
Accuracy on Blind test: 0.89
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.22340345 1.18731499 1.18555117 1.22307873 1.22605681 1.20699906
1.19885516 1.21126437 1.21934485 1.25942731]
mean value: 1.214129590988159
key: score_time
value: [0.20892453 0.24950743 0.22536755 0.18695331 0.23490357 0.29782224
0.26686096 0.28301716 0.21349239 0.1928103 ]
mean value: 0.23596594333648682
key: test_mcc
value: [0.80188377 0.78434561 0.87508299 0.91006494 0.86075909 0.84439989
0.856354 0.93029809 0.92727273 0.90924121]
mean value: 0.86997023122815
key: train_mcc
value: [0.9540162 0.95395495 0.94606663 0.94592996 0.94198 0.94994386
0.94198 0.94390049 0.94598487 0.94593927]
mean value: 0.9469696225050219
key: test_accuracy
value: [0.9009009 0.89189189 0.93693694 0.95495495 0.92792793 0.91891892
0.92792793 0.96396396 0.96363636 0.95454545]
mean value: 0.9341605241605242
key: train_accuracy
value: [0.97693079 0.97693079 0.97291876 0.97291876 0.97091274 0.97492477
0.97091274 0.97191575 0.97294589 0.97294589]
mean value: 0.9734256878852992
key: test_fscore
value: [0.89908257 0.89285714 0.93457944 0.95495495 0.93220339 0.92436975
0.92982456 0.96551724 0.96363636 0.95495495]
mean value: 0.935198036497558
key: train_fscore
value: [0.97715988 0.97711443 0.97324083 0.97313433 0.97114428 0.97507478
0.97114428 0.97205589 0.97313433 0.97308076]
mean value: 0.9736283776755992
key: test_precision
value: [0.90740741 0.87719298 0.96153846 0.94642857 0.88709677 0.87301587
0.9137931 0.93333333 0.96363636 0.94642857]
mean value: 0.9209871441886547
key: train_precision
value: [0.96850394 0.97035573 0.9627451 0.96640316 0.96252465 0.96831683
0.96252465 0.96626984 0.96640316 0.96825397]
mean value: 0.966230104125473
key: test_recall
value: [0.89090909 0.90909091 0.90909091 0.96363636 0.98214286 0.98214286
0.94642857 1. 0.96363636 0.96363636]
mean value: 0.9510714285714286
key: train_recall
value: [0.98597194 0.98396794 0.98396794 0.97995992 0.97991968 0.98192771
0.97991968 0.97791165 0.97995992 0.97795591]
mean value: 0.9811462281993706
key: test_roc_auc
value: [0.90081169 0.89204545 0.93668831 0.95503247 0.92743506 0.91834416
0.92775974 0.96363636 0.96363636 0.95454545]
mean value: 0.9339935064935065
key: train_roc_auc
value: [0.97692171 0.97692373 0.97290766 0.97291169 0.97092176 0.97493179
0.97092176 0.97192176 0.97294589 0.97294589]
mean value: 0.9734253647857964
key: test_jcc
value: [0.81666667 0.80645161 0.87719298 0.9137931 0.87301587 0.859375
0.86885246 0.93333333 0.92982456 0.9137931 ]
mean value: 0.8792298695691693
key: train_jcc
value: [0.95533981 0.95525292 0.94787645 0.94767442 0.94390716 0.95136187
0.94390716 0.94563107 0.94767442 0.94757282]
mean value: 0.9486198073744585
MCC on Blind test: 0.76
Accuracy on Blind test: 0.9
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02724433 0.01684785 0.0170033 0.01703453 0.01710773 0.01734543
0.01699972 0.01705408 0.01712227 0.01719022]
mean value: 0.018094944953918456
key: score_time
value: [0.0127182 0.01238012 0.01267314 0.01244044 0.01244259 0.01242185
0.01242566 0.0123899 0.01235771 0.01240087]
mean value: 0.01246504783630371
key: test_mcc
value: [0.58827674 0.64014294 0.56929191 0.7137294 0.62939373 0.69891539
0.66004053 0.64303575 0.73323558 0.80119274]
mean value: 0.667725470808785
key: train_mcc
value: [0.70693545 0.72572666 0.68759086 0.71792245 0.68682818 0.71394469
0.68575288 0.7136947 0.7113771 0.71018107]
mean value: 0.7059954028409837
key: test_accuracy
value: [0.79279279 0.81981982 0.78378378 0.85585586 0.81081081 0.84684685
0.82882883 0.81981982 0.86363636 0.9 ]
mean value: 0.8322194922194922
key: train_accuracy
value: [0.85155466 0.86158475 0.84252758 0.85757272 0.84152457 0.8555667
0.84152457 0.8555667 0.85470942 0.85370741]
mean value: 0.8515839100467736
key: test_fscore
value: [0.8 0.82142857 0.78947368 0.85964912 0.82644628 0.85714286
0.83760684 0.83050847 0.87179487 0.90265487]
mean value: 0.8396705567815326
key: train_fscore
value: [0.85904762 0.86730769 0.84918348 0.86372361 0.84923664 0.86153846
0.84807692 0.86127168 0.85990338 0.85988484]
mean value: 0.8579174317858217
key: test_precision
value: [0.76666667 0.80701754 0.76271186 0.83050847 0.76923077 0.80952381
0.80327869 0.79032258 0.82258065 0.87931034]
mean value: 0.8041151387422574
key: train_precision
value: [0.8185118 0.8336414 0.81549815 0.82872928 0.80909091 0.82656827
0.81365314 0.82777778 0.83022388 0.82504604]
mean value: 0.8228740648484011
key: test_recall
value: [0.83636364 0.83636364 0.81818182 0.89090909 0.89285714 0.91071429
0.875 0.875 0.92727273 0.92727273]
mean value: 0.8789935064935065
key: train_recall
value: [0.90380762 0.90380762 0.88577154 0.90180361 0.8935743 0.89959839
0.88554217 0.89759036 0.89178357 0.89779559]
mean value: 0.896107475996169
key: test_roc_auc
value: [0.79318182 0.81996753 0.78409091 0.85616883 0.81006494 0.84626623
0.82840909 0.81931818 0.86363636 0.9 ]
mean value: 0.8321103896103896
key: train_roc_auc
value: [0.8515022 0.86154236 0.84248417 0.85752831 0.84157673 0.85561082
0.84156868 0.85560881 0.85470942 0.85370741]
mean value: 0.8515838906729121
key: test_jcc
value: [0.66666667 0.6969697 0.65217391 0.75384615 0.70422535 0.75
0.72058824 0.71014493 0.77272727 0.82258065]
mean value: 0.7249922863357584
key: train_jcc
value: [0.75292154 0.76570458 0.73789649 0.76013514 0.73797678 0.75675676
0.73622705 0.75634518 0.75423729 0.75420875]
mean value: 0.7512409553820072
MCC on Blind test: 0.52
Accuracy on Blind test: 0.8
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.16932011 0.14638591 0.19654107 0.18655062 0.15451241 0.14728117
0.14999509 0.14579248 0.15139866 0.15017271]
mean value: 0.15979502201080323
key: score_time
value: [0.01136637 0.01236463 0.0134182 0.01141644 0.0115819 0.01142669
0.01245141 0.01140976 0.01291013 0.01137543]
mean value: 0.01197209358215332
key: test_mcc
value: [0.87402597 0.87402597 0.87398511 0.94735177 0.86471225 0.87508299
0.856354 0.94730174 0.82035423 0.9104463 ]
mean value: 0.8843640328352657
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93693694 0.93693694 0.93693694 0.97297297 0.92792793 0.93693694
0.92792793 0.97297297 0.90909091 0.95454545]
mean value: 0.9413185913185913
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93693694 0.93693694 0.93577982 0.97345133 0.93333333 0.93913043
0.92982456 0.97391304 0.9122807 0.95575221]
mean value: 0.9427339304962741
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92857143 0.92857143 0.94444444 0.94827586 0.875 0.91525424
0.9137931 0.94915254 0.88135593 0.93103448]
mean value: 0.9215453461727571
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.94545455 0.94545455 0.92727273 1. 1. 0.96428571
0.94642857 1. 0.94545455 0.98181818]
mean value: 0.9656168831168831
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93701299 0.93701299 0.93685065 0.97321429 0.92727273 0.93668831
0.92775974 0.97272727 0.90909091 0.95454545]
mean value: 0.9412175324675325
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88135593 0.88135593 0.87931034 0.94827586 0.875 0.8852459
0.86885246 0.94915254 0.83870968 0.91525424]
mean value: 0.8922512889039441
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.91
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.07442832 0.07900333 0.06712317 0.0942688 0.07648659 0.10709262
0.08123636 0.103127 0.07679653 0.08166337]
mean value: 0.08412261009216308
key: score_time
value: [0.02036667 0.01255226 0.01248574 0.02048302 0.01988649 0.01711273
0.01995468 0.02388263 0.01327038 0.02085662]
mean value: 0.018085122108459473
key: test_mcc
value: [0.66058982 0.74983877 0.78434561 0.79230071 0.71884134 0.70720342
0.68237361 0.78818464 0.82035423 0.85967619]
mean value: 0.7563708342016795
key: train_mcc
value: [0.84314661 0.84170481 0.84384054 0.84384054 0.84930373 0.84740528
0.83959549 0.8341973 0.83810503 0.84400673]
mean value: 0.8425146060693494
key: test_accuracy
value: [0.82882883 0.87387387 0.89189189 0.89189189 0.85585586 0.84684685
0.83783784 0.89189189 0.90909091 0.92727273]
mean value: 0.8755282555282555
key: train_accuracy
value: [0.92076229 0.91975928 0.92076229 0.92076229 0.92377131 0.9227683
0.91875627 0.91574724 0.91783567 0.92084168]
mean value: 0.9201766622512829
key: test_fscore
value: [0.83478261 0.87719298 0.89285714 0.89830508 0.86666667 0.86178862
0.85 0.89830508 0.9122807 0.93103448]
mean value: 0.8823213372566312
key: train_fscore
value: [0.92322643 0.92263056 0.9236715 0.9236715 0.92607004 0.92517007
0.9214355 0.91891892 0.92084942 0.9236715 ]
mean value: 0.9229315433333662
key: test_precision
value: [0.8 0.84745763 0.87719298 0.84126984 0.8125 0.79104478
0.796875 0.85483871 0.88135593 0.8852459 ]
mean value: 0.8387780770484182
key: train_precision
value: [0.89622642 0.89158879 0.89179104 0.89179104 0.89811321 0.89642185
0.89118199 0.88475836 0.88826816 0.89179104]
mean value: 0.8921931897070797
key: test_recall
value: [0.87272727 0.90909091 0.90909091 0.96363636 0.92857143 0.94642857
0.91071429 0.94642857 0.94545455 0.98181818]
mean value: 0.9313961038961038
key: train_recall
value: [0.95190381 0.95591182 0.95791583 0.95791583 0.95582329 0.95582329
0.95381526 0.95582329 0.95591182 0.95791583]
mean value: 0.9558760090462048
key: test_roc_auc
value: [0.82922078 0.87418831 0.89204545 0.89253247 0.85519481 0.84594156
0.83717532 0.8913961 0.90909091 0.92727273]
mean value: 0.8754058441558441
key: train_roc_auc
value: [0.92073102 0.91972298 0.92072498 0.92072498 0.92380343 0.92280143
0.9187914 0.9157874 0.91783567 0.92084168]
mean value: 0.9201764975734601
key: test_jcc
value: [0.71641791 0.78125 0.80645161 0.81538462 0.76470588 0.75714286
0.73913043 0.81538462 0.83870968 0.87096774]
mean value: 0.7905545347753463
key: train_jcc
value: [0.85740072 0.85637343 0.85816876 0.85816876 0.86231884 0.86075949
0.85431655 0.85 0.85330948 0.85816876]
mean value: 0.8568984796998163
MCC on Blind test: 0.57
Accuracy on Blind test: 0.82
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01855206 0.01753569 0.01760435 0.01728177 0.01746941 0.01717734
0.01688004 0.01726413 0.01733756 0.01808047]
mean value: 0.017518281936645508
key: score_time
value: [0.01274776 0.01306725 0.01289153 0.01265955 0.0127809 0.01270652
0.01321983 0.01303244 0.01284075 0.01262331]
mean value: 0.01285698413848877
key: test_mcc
value: [0.55139323 0.53257612 0.44274592 0.60540128 0.69373177 0.6962563
0.60540128 0.6962563 0.6401844 0.72835704]
mean value: 0.6192303636153702
key: train_mcc
value: [0.63490435 0.64093618 0.63093355 0.64504601 0.6248833 0.64093185
0.62689745 0.64098055 0.62750153 0.60320641]
mean value: 0.6316221168713542
key: test_accuracy
value: [0.77477477 0.76576577 0.72072072 0.8018018 0.84684685 0.84684685
0.8018018 0.84684685 0.81818182 0.86363636]
mean value: 0.8087223587223588
key: train_accuracy
value: [0.81745236 0.82046138 0.81544634 0.8224674 0.81243731 0.82046138
0.81344032 0.82046138 0.81362725 0.80160321]
mean value: 0.8157858344572797
key: test_fscore
value: [0.76190476 0.75471698 0.7047619 0.80701754 0.84955752 0.85470085
0.7962963 0.85470085 0.82758621 0.85981308]
mean value: 0.8071056010488992
key: train_fscore
value: [0.81763527 0.8201005 0.81673307 0.82103134 0.81168177 0.81973817
0.8125 0.8190091 0.81097561 0.80160321]
mean value: 0.8151008041422524
key: test_precision
value: [0.8 0.78431373 0.74 0.77966102 0.84210526 0.81967213
0.82692308 0.81967213 0.78688525 0.88461538]
mean value: 0.8083847975332427
key: train_precision
value: [0.81763527 0.82258065 0.81188119 0.82857143 0.81414141 0.82222222
0.81578947 0.82484725 0.82268041 0.80160321]
mean value: 0.8181952511733585
key: test_recall
value: [0.72727273 0.72727273 0.67272727 0.83636364 0.85714286 0.89285714
0.76785714 0.89285714 0.87272727 0.83636364]
mean value: 0.8083441558441559
key: train_recall
value: [0.81763527 0.81763527 0.82164329 0.81362725 0.80923695 0.81726908
0.80923695 0.81325301 0.7995992 0.80160321]
mean value: 0.8120739470909691
key: test_roc_auc
value: [0.77435065 0.76542208 0.72029221 0.80211039 0.84675325 0.84642857
0.80211039 0.84642857 0.81818182 0.86363636]
mean value: 0.8085714285714286
key: train_roc_auc
value: [0.81745217 0.82046422 0.81544012 0.82247628 0.81243411 0.82045819
0.81343611 0.82045416 0.81362725 0.80160321]
mean value: 0.815784581210614
key: test_jcc
value: [0.61538462 0.60606061 0.54411765 0.67647059 0.73846154 0.74626866
0.66153846 0.74626866 0.70588235 0.75409836]
mean value: 0.6794551483769089
key: train_jcc
value: [0.69152542 0.69505963 0.69023569 0.69639794 0.68305085 0.69453925
0.68421053 0.69349315 0.68205128 0.66889632]
mean value: 0.6879460057585034
MCC on Blind test: 0.5
Accuracy on Blind test: 0.78
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03212476 0.02761483 0.03664923 0.02784562 0.0398612 0.0308187
0.02523756 0.03237534 0.03949404 0.03122282]
mean value: 0.03232440948486328
key: score_time
value: [0.01321888 0.012743 0.01287627 0.0127027 0.01272225 0.01279831
0.01288557 0.01277661 0.01274395 0.01294923]
mean value: 0.01284167766571045
key: test_mcc
value: [0.67994289 0.68231769 0.83897362 0.7763355 0.70340005 0.74951538
0.63046459 0.71350607 0.80332642 0.80119274]
mean value: 0.7378974954164732
key: train_mcc
value: [0.82132227 0.75838527 0.83021815 0.81368596 0.75818559 0.83584066
0.75197395 0.73170243 0.82534613 0.82878108]
mean value: 0.7955441479582855
key: test_accuracy
value: [0.83783784 0.82882883 0.91891892 0.88288288 0.84684685 0.87387387
0.81081081 0.83783784 0.9 0.9 ]
mean value: 0.8637837837837838
key: train_accuracy
value: [0.90672016 0.87462387 0.91474423 0.90270812 0.87061184 0.91775326
0.87362086 0.85255767 0.91182365 0.91382766]
mean value: 0.893899132266539
key: test_fscore
value: [0.84482759 0.8 0.91588785 0.8907563 0.83495146 0.87931034
0.7961165 0.86153846 0.8952381 0.90265487]
mean value: 0.8621281469221023
key: train_fscore
value: [0.91283974 0.86427796 0.91658489 0.90926099 0.85521886 0.91881188
0.86595745 0.86979628 0.90890269 0.91601563]
mean value: 0.8937666354668264
key: test_precision
value: [0.80327869 0.95 0.94230769 0.828125 0.91489362 0.85
0.87234043 0.75675676 0.94 0.87931034]
mean value: 0.8737012524969817
key: train_precision
value: [0.85739437 0.94312796 0.89807692 0.85263158 0.96946565 0.90625
0.92081448 0.77812995 0.94004283 0.89333333]
mean value: 0.8959267071141968
key: test_recall
value: [0.89090909 0.69090909 0.89090909 0.96363636 0.76785714 0.91071429
0.73214286 1. 0.85454545 0.92727273]
mean value: 0.8628896103896104
key: train_recall
value: [0.9759519 0.79759519 0.93587174 0.9739479 0.76506024 0.93172691
0.81726908 0.98594378 0.87975952 0.93987976]
mean value: 0.9003006012024048
key: test_roc_auc
value: [0.83831169 0.8275974 0.91866883 0.8836039 0.84756494 0.87353896
0.81152597 0.83636364 0.9 0.9 ]
mean value: 0.8637175324675325
key: train_roc_auc
value: [0.90665065 0.87470121 0.91472302 0.9026366 0.87050607 0.91776726
0.8735644 0.85269133 0.91182365 0.91382766]
mean value: 0.8938891839904708
key: test_jcc
value: [0.73134328 0.66666667 0.84482759 0.8030303 0.71666667 0.78461538
0.66129032 0.75675676 0.81034483 0.82258065]
mean value: 0.7598122442852906
key: train_jcc
value: [0.83965517 0.76099426 0.84601449 0.83361921 0.74705882 0.84981685
0.76360225 0.76959248 0.83301708 0.84504505]
mean value: 0.8088415664093777
MCC on Blind test: 0.59
Accuracy on Blind test: 0.84
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03614044 0.03646159 0.03469062 0.0356729 0.0328548 0.03632474
0.04048204 0.03571701 0.04990339 0.0309155 ]
mean value: 0.03691630363464356
key: score_time
value: [0.01274538 0.01309633 0.01829362 0.02165842 0.01910233 0.02067685
0.01297855 0.01283383 0.01271367 0.01318717]
mean value: 0.01572861671447754
key: test_mcc
value: [0.65842676 0.80845318 0.82205752 0.72987013 0.69004484 0.73247207
0.33903271 0.62431781 0.78181818 0.56602204]
mean value: 0.6752515240726457
key: train_mcc
value: [0.85974878 0.82671395 0.83090715 0.82857913 0.71937214 0.81951538
0.31729713 0.73967951 0.85253237 0.69525894]
mean value: 0.7489604482361627
key: test_accuracy
value: [0.82882883 0.9009009 0.90990991 0.86486486 0.82882883 0.86486486
0.61261261 0.79279279 0.89090909 0.76363636]
mean value: 0.8258149058149058
key: train_accuracy
value: [0.92978937 0.90972919 0.91273821 0.91273821 0.8445336 0.90972919
0.59779338 0.86058175 0.9258517 0.83366733]
mean value: 0.863715193677224
key: test_fscore
value: [0.82242991 0.90598291 0.9122807 0.86486486 0.85271318 0.87179487
0.3943662 0.75268817 0.89090909 0.71111111]
mean value: 0.7979141000479969
key: train_fscore
value: [0.93055556 0.91541353 0.91753555 0.90890052 0.86391572 0.90909091
0.33499171 0.84293785 0.92745098 0.80652681]
mean value: 0.8357319130757249
key: test_precision
value: [0.84615385 0.85483871 0.88135593 0.85714286 0.75342466 0.83606557
0.93333333 0.94594595 0.89090909 0.91428571]
mean value: 0.8713455660956335
key: train_precision
value: [0.92141454 0.8619469 0.8705036 0.95175439 0.7675507 0.91463415
0.96190476 0.96382429 0.90786948 0.9637883 ]
mean value: 0.9085191106333975
key: test_recall
value: [0.8 0.96363636 0.94545455 0.87272727 0.98214286 0.91071429
0.25 0.625 0.89090909 0.58181818]
mean value: 0.7822402597402597
key: train_recall
value: [0.93987976 0.9759519 0.96993988 0.86973948 0.98795181 0.90361446
0.20281124 0.74899598 0.94789579 0.69338677]
mean value: 0.8240167081150253
key: test_roc_auc
value: [0.82857143 0.90146104 0.91022727 0.86493506 0.82743506 0.86444805
0.61590909 0.79431818 0.89090909 0.76363636]
mean value: 0.8261850649350649
key: train_roc_auc
value: [0.92977924 0.9096627 0.91268078 0.91278139 0.84467731 0.90972306
0.59739761 0.86046994 0.9258517 0.83366733]
mean value: 0.8636691052788308
key: test_jcc
value: [0.6984127 0.828125 0.83870968 0.76190476 0.74324324 0.77272727
0.24561404 0.60344828 0.80327869 0.55172414]
mean value: 0.6847187791112744
key: train_jcc
value: [0.87012987 0.8440208 0.84763573 0.83301344 0.76043277 0.83333333
0.20119522 0.72851562 0.86471664 0.67578125]
mean value: 0.7458774660122005
MCC on Blind test: 0.67
Accuracy on Blind test: 0.87
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.39727902 0.38057208 0.38459516 0.39634442 0.40336704 0.36749411
0.36946654 0.36745024 0.36640286 0.3657105 ]
mean value: 0.37986819744110106
key: score_time
value: [0.01661921 0.01776004 0.01697946 0.0177021 0.01578593 0.01602459
0.01588058 0.0157671 0.01576114 0.01573563]
mean value: 0.01640157699584961
key: test_mcc
value: [0.83912942 0.85584416 0.89188312 0.92854828 0.84111937 0.82182846
0.83793444 0.91119237 0.855111 0.9104463 ]
mean value: 0.8693036918146279
key: train_mcc
value: [0.95598996 0.94994245 0.93796514 0.9439521 0.95192505 0.94599262
0.97006835 0.93608015 0.95217918 0.95005434]
mean value: 0.949414934136432
key: test_accuracy
value: [0.91891892 0.92792793 0.94594595 0.96396396 0.91891892 0.90990991
0.91891892 0.95495495 0.92727273 0.95454545]
mean value: 0.9341277641277641
key: train_accuracy
value: [0.9779338 0.97492477 0.96890672 0.97191575 0.97592778 0.97291876
0.98495486 0.96790371 0.9759519 0.9749499 ]
mean value: 0.974628796208264
key: test_fscore
value: [0.92035398 0.92727273 0.94545455 0.96428571 0.92307692 0.9137931
0.92035398 0.95652174 0.92857143 0.95575221]
mean value: 0.93554363582312
key: train_fscore
value: [0.97813121 0.97512438 0.96921549 0.972167 0.9760479 0.97313433
0.98507463 0.96825397 0.97623762 0.97517378]
mean value: 0.974856031535136
key: test_precision
value: [0.89655172 0.92727273 0.94545455 0.94736842 0.8852459 0.88333333
0.9122807 0.93220339 0.9122807 0.93103448]
mean value: 0.9173025928988414
key: train_precision
value: [0.9704142 0.96837945 0.96062992 0.96449704 0.9702381 0.96449704
0.97633136 0.95686275 0.96477495 0.96653543]
mean value: 0.9663160237353894
key: test_recall
value: [0.94545455 0.92727273 0.94545455 0.98181818 0.96428571 0.94642857
0.92857143 0.98214286 0.94545455 0.98181818]
mean value: 0.9548701298701299
key: train_recall
value: [0.98597194 0.98196393 0.97795591 0.97995992 0.98192771 0.98192771
0.9939759 0.97991968 0.98797595 0.98396794]
mean value: 0.9835546595198429
key: test_roc_auc
value: [0.91915584 0.92792208 0.94594156 0.96412338 0.91850649 0.90957792
0.91883117 0.95470779 0.92727273 0.95454545]
mean value: 0.9340584415584415
key: train_roc_auc
value: [0.97792573 0.97491771 0.96889763 0.97190767 0.9759338 0.97292778
0.9849639 0.96791575 0.9759519 0.9749499 ]
mean value: 0.9746291780347844
key: test_jcc
value: [0.85245902 0.86440678 0.89655172 0.93103448 0.85714286 0.84126984
0.85245902 0.91666667 0.86666667 0.91525424]
mean value: 0.8793911288378621
key: train_jcc
value: [0.95719844 0.95145631 0.94026975 0.94584139 0.95321637 0.94767442
0.97058824 0.93846154 0.95357834 0.95155039]
mean value: 0.9509835187210858
MCC on Blind test: 0.81
Accuracy on Blind test: 0.92
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.24564409 0.23522663 0.257339 0.24964762 0.25360632 0.24560642
0.24368262 0.23203063 0.23844266 0.23963594]
mean value: 0.2440861940383911
key: score_time
value: [0.03826308 0.04138947 0.02316594 0.02756476 0.03848481 0.02812314
0.03311872 0.02815628 0.0273447 0.04294181]
mean value: 0.03285527229309082
key: test_mcc
value: [0.78376623 0.78859019 0.89242811 0.94735177 0.83897362 0.82027988
0.856354 0.856354 0.89149871 0.92727273]
mean value: 0.8602869242483097
key: train_mcc
value: [0.99598796 1. 0.99200792 0.98597563 0.99598796 0.99198394
0.99200792 0.98997191 0.99398997 0.99599198]
mean value: 0.9933905192511064
key: test_accuracy
value: [0.89189189 0.89189189 0.94594595 0.97297297 0.91891892 0.90990991
0.92792793 0.92792793 0.94545455 0.96363636]
mean value: 0.9296478296478297
key: train_accuracy
value: [0.99799398 1. 0.99598796 0.99297894 0.99799398 0.99598796
0.99598796 0.99498495 0.99699399 0.99799599]
mean value: 0.9966905727201645
key: test_fscore
value: [0.89090909 0.89655172 0.94444444 0.97345133 0.92173913 0.9122807
0.92982456 0.92982456 0.94642857 0.96363636]
mean value: 0.9309090476986216
key: train_fscore
value: [0.99799599 1. 0.99597586 0.99300699 0.99799197 0.99599198
0.996 0.99498495 0.996997 0.99799599]
mean value: 0.9966940735806726
key: test_precision
value: [0.89090909 0.85245902 0.96226415 0.94827586 0.89830508 0.89655172
0.9137931 0.9137931 0.92982456 0.96363636]
mean value: 0.9169812061135013
key: train_precision
value: [0.99799599 1. 1. 0.99003984 0.99799197 0.994
0.99203187 0.99398798 0.996 0.99799599]
mean value: 0.9960043640938736
key: test_recall
value: [0.89090909 0.94545455 0.92727273 1. 0.94642857 0.92857143
0.94642857 0.94642857 0.96363636 0.96363636]
mean value: 0.9458766233766234
key: train_recall
value: [0.99799599 1. 0.99198397 0.99599198 0.99799197 0.99799197
1. 0.99598394 0.99799599 0.99799599]
mean value: 0.9973931799341655
key: test_roc_auc
value: [0.89188312 0.89237013 0.94577922 0.97321429 0.91866883 0.90974026
0.92775974 0.92775974 0.94545455 0.96363636]
mean value: 0.9296266233766234
key: train_roc_auc
value: [0.99799398 1. 0.99599198 0.99297591 0.99799398 0.99598997
0.99599198 0.99498596 0.99699399 0.99799599]
mean value: 0.9966913747173061
key: test_jcc
value: [0.80327869 0.8125 0.89473684 0.94827586 0.85483871 0.83870968
0.86885246 0.86885246 0.89830508 0.92982456]
mean value: 0.8718174343977652
key: train_jcc
value: [0.996 1. 0.99198397 0.98611111 0.99599198 0.99201597
0.99203187 0.99001996 0.99401198 0.996 ]
mean value: 0.9934166839716496
MCC on Blind test: 0.71
Accuracy on Blind test: 0.89
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.55467296 0.51923943 0.50953078 0.58113885 0.50787997 0.54556322
0.46287131 0.50236225 0.59216118 0.5354085 ]
mean value: 0.5310828447341919
key: score_time
value: [0.04145479 0.02731085 0.04424119 0.04497147 0.0242722 0.04455423
0.04341412 0.03534317 0.04339361 0.04950762]
mean value: 0.0398463249206543
key: test_mcc
value: [0.66058982 0.71955846 0.73090707 0.79230071 0.66254427 0.73247207
0.71884134 0.78818464 0.8187233 0.80119274]
mean value: 0.7425314425935436
key: train_mcc
value: [0.95007973 0.95796744 0.96802783 0.95792134 0.95796862 0.95593733
0.95994962 0.95606082 0.96004323 0.9580101 ]
mean value: 0.9581966061111651
key: test_accuracy
value: [0.82882883 0.85585586 0.86486486 0.89189189 0.82882883 0.86486486
0.85585586 0.89189189 0.90909091 0.9 ]
mean value: 0.8691973791973792
key: train_accuracy
value: [0.97492477 0.97893681 0.98395186 0.97893681 0.97893681 0.9779338
0.97993982 0.9779338 0.97995992 0.97895792]
mean value: 0.9790412319121694
key: test_fscore
value: [0.83478261 0.86440678 0.86725664 0.89830508 0.84033613 0.87179487
0.86666667 0.89830508 0.91071429 0.89719626]
mean value: 0.8749764415328185
key: train_fscore
value: [0.97522299 0.97910448 0.98409543 0.97906281 0.97906281 0.97804391
0.98003992 0.97813121 0.98011928 0.97910448]
mean value: 0.9791987328205536
key: test_precision
value: [0.8 0.80952381 0.84482759 0.84126984 0.79365079 0.83606557
0.8125 0.85483871 0.89473684 0.92307692]
mean value: 0.8410490079281439
key: train_precision
value: [0.96470588 0.97233202 0.97633136 0.97420635 0.97227723 0.97222222
0.97420635 0.96850394 0.97238659 0.97233202]
mean value: 0.971950394805701
key: test_recall
value: [0.87272727 0.92727273 0.89090909 0.96363636 0.89285714 0.91071429
0.92857143 0.94642857 0.92727273 0.87272727]
mean value: 0.9133116883116883
key: train_recall
value: [0.98597194 0.98597194 0.99198397 0.98396794 0.98594378 0.98393574
0.98594378 0.98795181 0.98797595 0.98597194]
mean value: 0.9865618787776356
key: test_roc_auc
value: [0.82922078 0.85649351 0.8650974 0.89253247 0.82824675 0.86444805
0.85519481 0.8913961 0.90909091 0.9 ]
mean value: 0.8691720779220778
key: train_roc_auc
value: [0.97491368 0.97892975 0.98394379 0.97893176 0.97894383 0.97793982
0.97994584 0.97794384 0.97995992 0.97895792]
mean value: 0.9790410137544164
key: test_jcc
value: [0.71641791 0.76119403 0.765625 0.81538462 0.72463768 0.77272727
0.76470588 0.81538462 0.83606557 0.81355932]
mean value: 0.7785701903111762
key: train_jcc
value: [0.9516441 0.95906433 0.96868885 0.95898438 0.95898438 0.95703125
0.96086106 0.95719844 0.96101365 0.95906433]
mean value: 0.959253474650761
MCC on Blind test: 0.54
Accuracy on Blind test: 0.81
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.71389413 1.69086385 1.72496676 1.6965642 1.69422984 1.68370032
1.69756031 1.69427085 1.6981957 1.68466806]
mean value: 1.697891402244568
key: score_time
value: [0.01014042 0.00983286 0.00966883 0.00983334 0.01002479 0.00973153
0.00976062 0.00952935 0.01008153 0.00966859]
mean value: 0.009827184677124023
key: test_mcc
value: [0.85816689 0.86102173 0.87398511 0.93038564 0.88077101 0.85798501
0.83793444 0.91355091 0.87402845 0.92973479]
mean value: 0.8817563969991022
key: train_mcc
value: [0.99200779 0.99198387 0.98799559 0.98998777 0.98402331 0.99200792
0.99200792 0.98803559 0.99201584 0.99400594]
mean value: 0.9904071544165598
key: test_accuracy
value: [0.92792793 0.92792793 0.93693694 0.96396396 0.93693694 0.92792793
0.91891892 0.95495495 0.93636364 0.96363636]
mean value: 0.9395495495495495
key: train_accuracy
value: [0.99598796 0.99598796 0.99398195 0.99498495 0.99197593 0.99598796
0.99598796 0.99398195 0.99599198 0.99699399]
mean value: 0.9951862601833557
key: test_fscore
value: [0.92982456 0.93103448 0.93577982 0.96491228 0.94117647 0.93103448
0.92035398 0.95726496 0.9380531 0.96491228]
mean value: 0.9414346412337231
key: train_fscore
value: [0.99600798 0.996 0.99401198 0.995005 0.99201597 0.996
0.996 0.99401198 0.99600798 0.997003 ]
mean value: 0.9952063880231545
key: test_precision
value: [0.89830508 0.8852459 0.94444444 0.93220339 0.88888889 0.9
0.9122807 0.91803279 0.9137931 0.93220339]
mean value: 0.9125397691467365
key: train_precision
value: [0.99204771 0.99401198 0.99005964 0.99203187 0.98611111 0.99203187
0.99203187 0.98809524 0.99204771 0.9940239 ]
mean value: 0.9912492916749109
key: test_recall
value: [0.96363636 0.98181818 0.92727273 1. 1. 0.96428571
0.92857143 1. 0.96363636 1. ]
mean value: 0.972922077922078
key: train_recall
value: [1. 0.99799599 0.99799599 0.99799599 0.99799197 1.
1. 1. 1. 1. ]
mean value: 0.999197994382339
key: test_roc_auc
value: [0.92824675 0.92840909 0.93685065 0.96428571 0.93636364 0.9275974
0.91883117 0.95454545 0.93636364 0.96363636]
mean value: 0.939512987012987
key: train_roc_auc
value: [0.99598394 0.99598595 0.99397792 0.99498193 0.99198196 0.99599198
0.99599198 0.99398798 0.99599198 0.99699399]
mean value: 0.9951869602659134
key: test_jcc
value: [0.86885246 0.87096774 0.87931034 0.93220339 0.88888889 0.87096774
0.85245902 0.91803279 0.88333333 0.93220339]
mean value: 0.8897219092876875
key: train_jcc
value: [0.99204771 0.99203187 0.98809524 0.99005964 0.98415842 0.99203187
0.99203187 0.98809524 0.99204771 0.9940239 ]
mean value: 0.9904623483526916
MCC on Blind test: 0.77
Accuracy on Blind test: 0.91
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.04374695 0.06547046 0.04631853 0.04311037 0.04469585 0.04323721
0.04338408 0.04668546 0.04348207 0.04395962]
mean value: 0.04640905857086182
key: score_time
value: [0.01359057 0.01312923 0.01343632 0.01324058 0.01333547 0.01309013
0.01344275 0.01334 0.01368546 0.01321197]
mean value: 0.013350248336791992
key: test_mcc
value: [0.47871355 0.3222257 0.33903271 0.35540963 0.25544091 0.41410537
0.41410537 0.47304992 0.30976699 0.45693677]
mean value: 0.38187869064725444
key: train_mcc
value: [0.44253373 0.43112172 0.44739295 0.41128449 0.40377368 0.40879303
0.43189143 0.40544978 0.39041637 0.43206773]
mean value: 0.42047249016488764
key: test_accuracy
value: [0.68468468 0.6036036 0.61261261 0.62162162 0.58558559 0.64864865
0.64864865 0.68468468 0.6 0.67272727]
mean value: 0.6362817362817363
key: train_accuracy
value: [0.66399198 0.65697091 0.667001 0.6449348 0.63991976 0.64292879
0.65697091 0.64092277 0.63226453 0.65731463]
mean value: 0.6503220081084938
key: test_fscore
value: [0.75862069 0.71052632 0.71523179 0.72 0.7012987 0.74172185
0.74172185 0.76190476 0.71052632 0.75342466]
mean value: 0.7314976938660571
key: train_fscore
value: [0.74868717 0.74477612 0.75037594 0.73816568 0.73505535 0.73668639
0.74439462 0.73559823 0.73113553 0.74477612]
mean value: 0.7409651149451728
key: test_precision
value: [0.61111111 0.55670103 0.5625 0.56842105 0.55102041 0.58947368
0.58947368 0.61538462 0.55670103 0.6043956 ]
mean value: 0.5805182221962898
key: train_precision
value: [0.59832134 0.59334126 0.60048135 0.58499414 0.58109685 0.58313817
0.59285714 0.5817757 0.57621247 0.59334126]
mean value: 0.5885559687543657
key: test_recall
value: [1. 0.98181818 0.98181818 0.98181818 0.96428571 1.
1. 1. 0.98181818 1. ]
mean value: 0.9891558441558441
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.6875 0.60698052 0.61590909 0.62483766 0.58214286 0.64545455
0.64545455 0.68181818 0.6 0.67272727]
mean value: 0.6362824675324675
key: train_roc_auc
value: [0.66365462 0.65662651 0.66666667 0.64457831 0.64028056 0.64328657
0.65731463 0.64128257 0.63226453 0.65731463]
mean value: 0.6503269591391618
key: test_jcc
value: [0.61111111 0.55102041 0.55670103 0.5625 0.54 0.58947368
0.58947368 0.61538462 0.55102041 0.6043956 ]
mean value: 0.5771080546566749
key: train_jcc
value: [0.59832134 0.59334126 0.60048135 0.58499414 0.58109685 0.58313817
0.59285714 0.5817757 0.57621247 0.59334126]
mean value: 0.5885559687543657
MCC on Blind test: 0.14
Accuracy on Blind test: 0.43
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02749586 0.04280066 0.02907753 0.03794432 0.03937221 0.03592968
0.04454541 0.04316807 0.04792213 0.0542562 ]
mean value: 0.04025120735168457
key: score_time
value: [0.01947474 0.02240396 0.02471495 0.02588439 0.02503633 0.01949716
0.01938891 0.01940846 0.01542902 0.02859068]
mean value: 0.021982860565185548
key: test_mcc
value: [0.66058982 0.78434561 0.81980519 0.80845318 0.80802876 0.73912573
0.7306455 0.78087736 0.78651226 0.85681441]
mean value: 0.7775197836320717
key: train_mcc
value: [0.83473792 0.82049509 0.81956135 0.81892375 0.82857913 0.82239706
0.819597 0.81679117 0.82443086 0.8253123 ]
mean value: 0.8230825643597578
key: test_accuracy
value: [0.82882883 0.89189189 0.90990991 0.9009009 0.9009009 0.86486486
0.86486486 0.88288288 0.89090909 0.92727273]
mean value: 0.8863226863226863
key: train_accuracy
value: [0.91574724 0.90872618 0.90772317 0.90772317 0.91273821 0.90972919
0.90772317 0.90672016 0.91082164 0.91082164]
mean value: 0.909847377804757
key: test_fscore
value: [0.83478261 0.89285714 0.90909091 0.90598291 0.90756303 0.87603306
0.86956522 0.89430894 0.89655172 0.92982456]
mean value: 0.8916560095710109
key: train_fscore
value: [0.9193858 0.91258405 0.91221374 0.91187739 0.91626564 0.9132948
0.91204589 0.91066282 0.91434071 0.91483254]
mean value: 0.9137503384577215
key: test_precision
value: [0.8 0.87719298 0.90909091 0.85483871 0.85714286 0.81538462
0.84745763 0.82089552 0.85245902 0.89830508]
mean value: 0.8532767324397851
key: train_precision
value: [0.88213628 0.87638376 0.87067395 0.8733945 0.87985213 0.87777778
0.87043796 0.87292818 0.87962963 0.87545788]
mean value: 0.8758672033376387
key: test_recall
value: [0.87272727 0.90909091 0.90909091 0.96363636 0.96428571 0.94642857
0.89285714 0.98214286 0.94545455 0.96363636]
mean value: 0.934935064935065
key: train_recall
value: [0.95991984 0.95190381 0.95791583 0.95390782 0.95582329 0.95180723
0.95783133 0.95180723 0.95190381 0.95791583]
mean value: 0.9550736010172955
key: test_roc_auc
value: [0.82922078 0.89204545 0.9099026 0.90146104 0.90032468 0.86412338
0.86461039 0.88198052 0.89090909 0.92727273]
mean value: 0.8861850649350649
key: train_roc_auc
value: [0.91570289 0.90868283 0.90767278 0.9076768 0.91278139 0.90977135
0.90777338 0.90676534 0.91082164 0.91082164]
mean value: 0.9098470032434346
key: test_jcc
value: [0.71641791 0.80645161 0.83333333 0.828125 0.83076923 0.77941176
0.76923077 0.80882353 0.8125 0.86885246]
mean value: 0.8053915609818361
key: train_jcc
value: [0.85079929 0.83922261 0.83859649 0.83802817 0.84547069 0.84042553
0.83831283 0.83597884 0.84219858 0.84303351]
mean value: 0.8412066546000827
MCC on Blind test: 0.61
Accuracy on Blind test: 0.83
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.31536579 0.30559421 0.21722341 0.35626459 0.46322298 0.34366488
0.35091496 0.33481693 0.30926394 0.29198647]
mean value: 0.3288318157196045
key: score_time
value: [0.01272392 0.01928926 0.01933956 0.02210021 0.01939154 0.01926899
0.01933599 0.02292752 0.0124402 0.02368855]
mean value: 0.01905057430267334
key: test_mcc
value: [0.66058982 0.76698119 0.78434561 0.80845318 0.80802876 0.75530907
0.74951538 0.78087736 0.78651226 0.85681441]
mean value: 0.7757427039457268
key: train_mcc
value: [0.83473792 0.82940212 0.83847423 0.83819631 0.82857913 0.83936409
0.838222 0.81679117 0.82443086 0.8253123 ]
mean value: 0.8313510135109393
key: test_accuracy
value: [0.82882883 0.88288288 0.89189189 0.9009009 0.9009009 0.87387387
0.87387387 0.88288288 0.89090909 0.92727273]
mean value: 0.8854217854217854
key: train_accuracy
value: [0.91574724 0.91374122 0.91775326 0.91775326 0.91273821 0.91875627
0.91775326 0.90672016 0.91082164 0.91082164]
mean value: 0.9142606175239144
key: test_fscore
value: [0.83478261 0.88495575 0.89285714 0.90598291 0.90756303 0.88333333
0.87931034 0.89430894 0.89655172 0.92982456]
mean value: 0.8909470341749964
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:136: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:139: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.9193858 0.91666667 0.92115385 0.92100193 0.91626564 0.9212828
0.92084942 0.91066282 0.91434071 0.91483254]
mean value: 0.9176442168185582
key: test_precision
value: [0.8 0.86206897 0.87719298 0.85483871 0.85714286 0.828125
0.85 0.82089552 0.85245902 0.89830508]
mean value: 0.8501028138320923
key: train_precision
value: [0.88213628 0.88742964 0.88539741 0.88682746 0.87985213 0.89265537
0.8866171 0.87292818 0.87962963 0.87545788]
mean value: 0.8828931069088831
key: test_recall
value: [0.87272727 0.90909091 0.90909091 0.96363636 0.96428571 0.94642857
0.91071429 0.98214286 0.94545455 0.96363636]
mean value: 0.9367207792207792
key: train_recall
value: [0.95991984 0.94789579 0.95991984 0.95791583 0.95582329 0.95180723
0.95783133 0.95180723 0.95190381 0.95791583]
mean value: 0.9552740018188988
key: test_roc_auc
value: [0.82922078 0.88311688 0.89204545 0.90146104 0.90032468 0.87321429
0.87353896 0.88198052 0.89090909 0.92727273]
mean value: 0.8853084415584416
key: train_roc_auc
value: [0.91570289 0.91370693 0.91771092 0.91771294 0.91278139 0.91878939
0.91779342 0.90676534 0.91082164 0.91082164]
mean value: 0.9142606498136836
key: test_jcc
value: [0.71641791 0.79365079 0.80645161 0.828125 0.83076923 0.79104478
0.78461538 0.80882353 0.8125 0.86885246]
mean value: 0.8041250696933957
key: train_jcc
value: [0.85079929 0.84615385 0.85383244 0.85357143 0.84547069 0.85405405
0.85330948 0.83597884 0.84219858 0.84303351]
mean value: 0.847840216154083
MCC on Blind test: 0.62
Accuracy on Blind test: 0.84
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04093432 0.04260874 0.05253553 0.05427361 0.04360962 0.04467535
0.04421616 0.05421448 0.04414415 0.04397631]
mean value: 0.04651882648468018
key: score_time
value: [0.01233125 0.0134294 0.01342249 0.01335311 0.0134294 0.01324058
0.01339912 0.01341963 0.01598525 0.01339984]
mean value: 0.013541007041931152
key: test_mcc
value: [0.60357143 0.82205752 0.72978244 0.69959151 0.78818464 0.66597107
0.73247207 0.69891539 0.80332642 0.7823356 ]
mean value: 0.7326208073785166
key: train_mcc
value: [0.78495504 0.77672016 0.79042861 0.78136227 0.77254608 0.79725623
0.77694243 0.78307581 0.77994086 0.76055213]
mean value: 0.7803779608362683
key: test_accuracy
value: [0.8018018 0.90990991 0.86486486 0.84684685 0.89189189 0.82882883
0.86486486 0.84684685 0.9 0.89090909]
mean value: 0.8646764946764947
key: train_accuracy
value: [0.89167503 0.88766299 0.89368104 0.88966901 0.88565697 0.89769308
0.88766299 0.89067202 0.88877756 0.87975952]
mean value: 0.889291019350637
key: test_fscore
value: [0.8 0.9122807 0.86238532 0.85470085 0.89830508 0.84297521
0.87179487 0.85714286 0.90434783 0.89285714]
mean value: 0.8696789866795319
key: train_fscore
value: [0.89514563 0.89105058 0.89827255 0.89361702 0.88867188 0.90097087
0.89105058 0.89407191 0.89296046 0.8828125 ]
mean value: 0.8928623998583001
key: test_precision
value: [0.8 0.88135593 0.87037037 0.80645161 0.85483871 0.78461538
0.83606557 0.80952381 0.86666667 0.87719298]
mean value: 0.8387081042186898
key: train_precision
value: [0.86817326 0.8657845 0.86187845 0.8635514 0.86501901 0.87218045
0.86415094 0.86629002 0.8605948 0.86095238]
mean value: 0.8648575213221116
key: test_recall
value: [0.8 0.94545455 0.85454545 0.90909091 0.94642857 0.91071429
0.91071429 0.91071429 0.94545455 0.90909091]
mean value: 0.9042207792207793
key: train_recall
value: [0.9238477 0.91783567 0.93787575 0.9258517 0.91365462 0.93172691
0.91967871 0.92369478 0.92785571 0.90581162]
mean value: 0.9227833176392947
key: test_roc_auc
value: [0.80178571 0.91022727 0.86477273 0.8474026 0.8913961 0.82808442
0.86444805 0.84626623 0.9 0.89090909]
mean value: 0.8645292207792208
key: train_roc_auc
value: [0.89164272 0.8876327 0.89363667 0.88963268 0.88568502 0.89772718
0.88769507 0.8907051 0.88877756 0.87975952]
mean value: 0.8892894222179298
key: test_jcc
value: [0.66666667 0.83870968 0.75806452 0.74626866 0.81538462 0.72857143
0.77272727 0.75 0.82539683 0.80645161]
mean value: 0.770824127191484
key: train_jcc
value: [0.81019332 0.80350877 0.81533101 0.80769231 0.79964851 0.81978799
0.80350877 0.80843585 0.80662021 0.79020979]
mean value: 0.8064936527280264
MCC on Blind test: 0.64
Accuracy on Blind test: 0.84
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.15100551 1.03059697 1.11383557 1.17403126 1.12681389 1.05125737
1.17251849 1.03575873 1.04927802 1.27532935]
mean value: 1.118042516708374
key: score_time
value: [0.01579118 0.01581717 0.01901102 0.01574683 0.01594114 0.01595163
0.02020645 0.01380491 0.01625299 0.01591539]
mean value: 0.016443872451782228
key: test_mcc
value: [0.62175325 0.87402597 0.82182846 0.69959151 0.8049036 0.73247207
0.82027988 0.75979502 0.69102332 0.7499303 ]
mean value: 0.7575603381593177
key: train_mcc
value: [0.81462589 0.87037589 0.85431279 0.85417564 0.84386768 0.86027684
0.84810102 0.86845461 0.8723112 0.87219196]
mean value: 0.8558693517884952
key: test_accuracy
value: [0.81081081 0.93693694 0.90990991 0.84684685 0.9009009 0.86486486
0.90990991 0.87387387 0.84545455 0.87272727]
mean value: 0.8772235872235872
key: train_accuracy
value: [0.90672016 0.93480441 0.92678034 0.92678034 0.9217653 0.92978937
0.92377131 0.9338014 0.93587174 0.93587174]
mean value: 0.9275956124887689
key: test_fscore
value: [0.81081081 0.93693694 0.90566038 0.85470085 0.90598291 0.87179487
0.9122807 0.8852459 0.84684685 0.87931034]
mean value: 0.8809570552653033
key: train_fscore
value: [0.90926829 0.93621197 0.92836114 0.92822026 0.92277228 0.93110236
0.92504931 0.93516699 0.93700787 0.93688363]
mean value: 0.9290044105640144
key: test_precision
value: [0.80357143 0.92857143 0.94117647 0.80645161 0.86885246 0.83606557
0.89655172 0.81818182 0.83928571 0.83606557]
mean value: 0.8574773803797159
key: train_precision
value: [0.88593156 0.91730769 0.90961538 0.91119691 0.91015625 0.91312741
0.90891473 0.91538462 0.92069632 0.9223301 ]
mean value: 0.9114660976288571
key: test_recall
value: [0.81818182 0.94545455 0.87272727 0.90909091 0.94642857 0.91071429
0.92857143 0.96428571 0.85454545 0.92727273]
mean value: 0.9077272727272727
key: train_recall
value: [0.93386774 0.95591182 0.94789579 0.94589178 0.93574297 0.9497992
0.94176707 0.95582329 0.95390782 0.95190381]
mean value: 0.9472511287635512
key: test_roc_auc
value: [0.81087662 0.93701299 0.90957792 0.8474026 0.90048701 0.86444805
0.90974026 0.87305195 0.84545455 0.87272727]
mean value: 0.877077922077922
key: train_roc_auc
value: [0.9066929 0.93478322 0.92675914 0.92676115 0.9217793 0.92980942
0.92378935 0.93382347 0.93587174 0.93587174]
mean value: 0.9275941441115163
key: test_jcc
value: [0.68181818 0.88135593 0.82758621 0.74626866 0.828125 0.77272727
0.83870968 0.79411765 0.734375 0.78461538]
mean value: 0.7889698959455377
key: train_jcc
value: [0.83363148 0.8800738 0.86630037 0.86605505 0.85661765 0.87108656
0.86055046 0.87822878 0.88148148 0.8812616 ]
mean value: 0.8675287218964672
MCC on Blind test: 0.63
Accuracy on Blind test: 0.85
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01754236 0.01333523 0.01295114 0.01299715 0.01282549 0.013062
0.01278353 0.01383567 0.01366758 0.01300454]
mean value: 0.013600468635559082
key: score_time
value: [0.01324677 0.00990438 0.00989175 0.00963712 0.01019049 0.01029158
0.01033616 0.00978732 0.01000738 0.01002502]
mean value: 0.01033179759979248
key: test_mcc
value: [0.51517746 0.4775974 0.46159963 0.49641957 0.67720229 0.51815539
0.51398927 0.41165822 0.56363636 0.6401844 ]
mean value: 0.527562000397169
key: train_mcc
value: [0.53949298 0.54988148 0.54586088 0.56805192 0.55221636 0.54760475
0.5415118 0.5415118 0.55760823 0.53774946]
mean value: 0.548148963922759
key: test_accuracy
value: [0.75675676 0.73873874 0.72972973 0.74774775 0.83783784 0.75675676
0.75675676 0.7027027 0.78181818 0.81818182]
mean value: 0.7627027027027027
key: train_accuracy
value: [0.76930792 0.77432297 0.77231695 0.78335005 0.77432297 0.77331996
0.77031093 0.77031093 0.77855711 0.76853707]
mean value: 0.7734656876440946
key: test_fscore
value: [0.74285714 0.73873874 0.71153846 0.73584906 0.84482759 0.74285714
0.76521739 0.67961165 0.78181818 0.80769231]
mean value: 0.755100766010243
key: train_fscore
value: [0.7628866 0.76683938 0.76476684 0.77593361 0.76038339 0.76604555
0.76318511 0.76318511 0.77379734 0.76258993]
mean value: 0.7659612844765213
key: test_precision
value: [0.78 0.73214286 0.75510204 0.76470588 0.81666667 0.79591837
0.74576271 0.74468085 0.78181818 0.85714286]
mean value: 0.7773940416215006
key: train_precision
value: [0.78556263 0.79399142 0.79184549 0.80430108 0.80952381 0.79059829
0.78678038 0.78678038 0.79079498 0.78270042]
mean value: 0.7922878886569598
key: test_recall
value: [0.70909091 0.74545455 0.67272727 0.70909091 0.875 0.69642857
0.78571429 0.625 0.78181818 0.76363636]
mean value: 0.7363961038961039
key: train_recall
value: [0.74148297 0.74148297 0.73947896 0.749499 0.71686747 0.74297189
0.74096386 0.74096386 0.75751503 0.74348697]
mean value: 0.741471296005666
key: test_roc_auc
value: [0.75633117 0.7387987 0.72922078 0.7474026 0.8375 0.75730519
0.75649351 0.70340909 0.78181818 0.81818182]
mean value: 0.7626461038961039
key: train_roc_auc
value: [0.76933586 0.77435594 0.77234992 0.78338404 0.7742654 0.77328955
0.77028153 0.77028153 0.77855711 0.76853707]
mean value: 0.7734637950599995
key: test_jcc
value: [0.59090909 0.58571429 0.55223881 0.58208955 0.73134328 0.59090909
0.61971831 0.51470588 0.64179104 0.67741935]
mean value: 0.6086838701150438
key: train_jcc
value: [0.61666667 0.62184874 0.61912752 0.63389831 0.61340206 0.62080537
0.61705686 0.61705686 0.63105175 0.61627907]
mean value: 0.6207193194072481
MCC on Blind test: 0.44
Accuracy on Blind test: 0.76
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01254296 0.01719975 0.01711202 0.01715064 0.01714396 0.01716685
0.0172646 0.01705408 0.01714993 0.01713777]
mean value: 0.016692256927490233
key: score_time
value: [0.01235557 0.01248646 0.01242685 0.01279688 0.01244903 0.01248336
0.01246428 0.01228619 0.01236367 0.01239777]
mean value: 0.01245100498199463
key: test_mcc
value: [0.53168696 0.53168696 0.5365027 0.60383519 0.64303575 0.6048892
0.65842676 0.60383519 0.71097366 0.62075223]
mean value: 0.6045624604926498
key: train_mcc
value: [0.62690345 0.63122939 0.61741439 0.64894709 0.632912 0.6491123
0.62326104 0.64713651 0.64529188 0.60949387]
mean value: 0.6331701922310734
key: test_accuracy
value: [0.76576577 0.76576577 0.76576577 0.8018018 0.81981982 0.8018018
0.82882883 0.8018018 0.85454545 0.80909091]
mean value: 0.8014987714987715
key: train_accuracy
value: [0.81344032 0.81544634 0.80842528 0.82447342 0.81644935 0.82447342
0.8114343 0.82347041 0.82264529 0.80460922]
mean value: 0.8164867347533582
key: test_fscore
value: [0.75925926 0.75925926 0.74509804 0.7962963 0.83050847 0.81034483
0.83478261 0.80701754 0.85964912 0.8 ]
mean value: 0.8002215431555298
key: train_fscore
value: [0.81287726 0.81262729 0.80450358 0.82482482 0.81681682 0.82621648
0.80777096 0.82539683 0.8224674 0.80162767]
mean value: 0.815512912261371
key: test_precision
value: [0.77358491 0.77358491 0.80851064 0.81132075 0.79032258 0.78333333
0.81355932 0.79310345 0.83050847 0.84 ]
mean value: 0.8017828363200135
key: train_precision
value: [0.81616162 0.82608696 0.82217573 0.824 0.81437126 0.8172888
0.82291667 0.81568627 0.82329317 0.81404959]
mean value: 0.819603006460176
key: test_recall
value: [0.74545455 0.74545455 0.69090909 0.78181818 0.875 0.83928571
0.85714286 0.82142857 0.89090909 0.76363636]
mean value: 0.8011038961038961
key: train_recall
value: [0.80961924 0.7995992 0.78757515 0.8256513 0.81927711 0.83534137
0.79317269 0.83534137 0.82164329 0.78957916]
mean value: 0.8116799864789821
key: test_roc_auc
value: [0.76558442 0.76558442 0.7650974 0.80162338 0.81931818 0.80146104
0.82857143 0.80162338 0.85454545 0.80909091]
mean value: 0.80125
key: train_roc_auc
value: [0.81344416 0.81546225 0.80844621 0.82447224 0.81645218 0.82448431
0.811416 0.82348231 0.82264529 0.80460922]
mean value: 0.8164914165680759
key: test_jcc
value: [0.6119403 0.6119403 0.59375 0.66153846 0.71014493 0.68115942
0.71641791 0.67647059 0.75384615 0.66666667]
mean value: 0.668387472557535
key: train_jcc
value: [0.68474576 0.68439108 0.67294521 0.70187394 0.69035533 0.70389171
0.67753002 0.7027027 0.69846678 0.66893039]
mean value: 0.6885832913576179
MCC on Blind test: 0.5
Accuracy on Blind test: 0.79
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01590347 0.01198936 0.01165318 0.01147985 0.01264477 0.01251578
0.01235557 0.01222801 0.01148844 0.01146007]
mean value: 0.01237185001373291
key: score_time
value: [0.04040146 0.0159626 0.0156765 0.01903772 0.01762676 0.01922369
0.01665974 0.01898384 0.01707697 0.02237201]
mean value: 0.020302128791809083
key: test_mcc
value: [0.70340005 0.47838827 0.57407396 0.48186817 0.6048892 0.54348795
0.69757747 0.4778799 0.62325024 0.56400939]
mean value: 0.5748824590071774
key: train_mcc
value: [0.73890566 0.71341062 0.73549304 0.76812618 0.73773179 0.74176802
0.75322467 0.70564267 0.73393064 0.72957779]
mean value: 0.7357811061752336
key: test_accuracy
value: [0.84684685 0.73873874 0.78378378 0.73873874 0.8018018 0.76576577
0.83783784 0.73873874 0.80909091 0.78181818]
mean value: 0.7843161343161343
key: train_accuracy
value: [0.86760281 0.8555667 0.86559679 0.88164493 0.8665998 0.86860582
0.87462387 0.85155466 0.86472946 0.86272545]
mean value: 0.8659250295978115
key: test_fscore
value: [0.85714286 0.74336283 0.79661017 0.75213675 0.81034483 0.79032258
0.85714286 0.74782609 0.82051282 0.78571429]
mean value: 0.7961116069187395
key: train_fscore
value: [0.8740458 0.86127168 0.87262357 0.88804554 0.87345385 0.8753568
0.88061127 0.85741811 0.87179487 0.86964795]
mean value: 0.8724269457459887
key: test_precision
value: [0.796875 0.72413793 0.74603175 0.70967742 0.78333333 0.72058824
0.77142857 0.72881356 0.77419355 0.77192982]
mean value: 0.7527009168747624
key: train_precision
value: [0.83424408 0.82931354 0.83001808 0.84324324 0.83001808 0.8318264
0.83970856 0.82407407 0.82851986 0.82789855]
mean value: 0.8318864476214571
key: test_recall
value: [0.92727273 0.76363636 0.85454545 0.8 0.83928571 0.875
0.96428571 0.76785714 0.87272727 0.8 ]
mean value: 0.846461038961039
key: train_recall
value: [0.91783567 0.89579158 0.91983968 0.93787575 0.92168675 0.92369478
0.92570281 0.8935743 0.91983968 0.91583166]
mean value: 0.9171672662594265
key: test_roc_auc
value: [0.84756494 0.73896104 0.78441558 0.73928571 0.80146104 0.76477273
0.83668831 0.73847403 0.80909091 0.78181818]
mean value: 0.7842532467532468
key: train_roc_auc
value: [0.86755237 0.85552631 0.86554233 0.88158848 0.866655 0.86866102
0.87467505 0.85159677 0.86472946 0.86272545]
mean value: 0.8659252239418596
key: test_jcc
value: [0.75 0.5915493 0.66197183 0.60273973 0.68115942 0.65333333
0.75 0.59722222 0.69565217 0.64705882]
mean value: 0.6630686826075827
key: train_jcc
value: [0.77627119 0.75634518 0.77403035 0.79863481 0.77533784 0.77834179
0.78668942 0.75042159 0.77272727 0.76936027]
mean value: 0.7738159708974901
MCC on Blind test: 0.43
Accuracy on Blind test: 0.74
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.05335617 0.05310178 0.0520792 0.05209589 0.06355214 0.05141449
0.06059766 0.06146669 0.06533003 0.06051207]
mean value: 0.05735061168670654
key: score_time
value: [0.01984859 0.01980829 0.01976943 0.01942945 0.02001691 0.01954556
0.02005029 0.02004004 0.02193451 0.01983261]
mean value: 0.020027565956115722
key: test_mcc
value: [0.61044509 0.77216596 0.67619361 0.66058982 0.82447186 0.5928164
0.71335883 0.64590588 0.73720978 0.82035423]
mean value: 0.7053511460043415
key: train_mcc
value: [0.75534914 0.75786007 0.75012721 0.76480992 0.74430166 0.75943336
0.74082414 0.7473325 0.72307793 0.74636831]
mean value: 0.7489484245836903
key: test_accuracy
value: [0.8018018 0.88288288 0.83783784 0.82882883 0.90990991 0.79279279
0.85585586 0.81981982 0.86363636 0.90909091]
mean value: 0.8502457002457002
key: train_accuracy
value: [0.87562688 0.87662989 0.87261785 0.87963892 0.86960883 0.8776329
0.86860582 0.87161484 0.85971944 0.87074148]
mean value: 0.8722436849627038
key: test_fscore
value: [0.81355932 0.88888889 0.83928571 0.83478261 0.91525424 0.80991736
0.86206897 0.83333333 0.87394958 0.9122807 ]
mean value: 0.8583320707001083
key: train_fscore
value: [0.88190476 0.88319088 0.87962085 0.88657845 0.87666034 0.88358779
0.87464115 0.8778626 0.86641221 0.87772512]
mean value: 0.8788184151866292
key: test_precision
value: [0.76190476 0.83870968 0.8245614 0.8 0.87096774 0.75384615
0.83333333 0.78125 0.8125 0.88135593]
mean value: 0.815842900415125
key: train_precision
value: [0.84029038 0.83935018 0.83453237 0.83899821 0.83093525 0.84181818
0.83546618 0.83636364 0.82695811 0.83273381]
mean value: 0.8357446314558294
key: test_recall
value: [0.87272727 0.94545455 0.85454545 0.87272727 0.96428571 0.875
0.89285714 0.89285714 0.94545455 0.94545455]
mean value: 0.9061363636363636
key: train_recall
value: [0.92785571 0.93186373 0.92985972 0.93987976 0.92771084 0.92971888
0.91767068 0.92369478 0.90981964 0.92785571]
mean value: 0.9265929449259966
key: test_roc_auc
value: [0.80243506 0.88344156 0.83798701 0.82922078 0.90941558 0.79204545
0.85551948 0.81915584 0.86363636 0.90909091]
mean value: 0.8501948051948052
key: train_roc_auc
value: [0.87557444 0.87657443 0.87256038 0.87957843 0.86966704 0.87768509
0.86865498 0.87166703 0.85971944 0.87074148]
mean value: 0.8722422757160908
key: test_jcc
value: [0.68571429 0.8 0.72307692 0.71641791 0.84375 0.68055556
0.75757576 0.71428571 0.7761194 0.83870968]
mean value: 0.7536205227060427
key: train_jcc
value: [0.78875639 0.79081633 0.78510998 0.79626486 0.78040541 0.79145299
0.77721088 0.78231293 0.76430976 0.78209459]
mean value: 0.7838734118999983
MCC on Blind test: 0.63
Accuracy on Blind test: 0.83
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [3.75282788 3.81552005 3.63571382 3.76962566 3.72686815 3.62004423
3.12245107 3.29060173 3.8202343 3.61941886]
mean value: 3.6173305749893188
key: score_time
value: [0.01521969 0.01497841 0.0150671 0.01494384 0.01501822 0.01528382
0.01300669 0.0128808 0.01538062 0.01552033]
mean value: 0.014729952812194825
key: test_mcc
value: [0.86504296 0.85816689 0.89249761 0.82205752 0.89704631 0.82447186
0.84111937 0.82824452 0.92973479 0.87402845]
mean value: 0.863241028464841
key: train_mcc
value: [0.99599599 0.99799598 0.9939999 0.99598796 0.99399998 0.99598796
0.99398395 0.97621121 0.997998 0.99400594]
mean value: 0.993616687465768
key: test_accuracy
value: [0.92792793 0.92792793 0.94594595 0.90990991 0.94594595 0.90990991
0.91891892 0.90990991 0.96363636 0.93636364]
mean value: 0.9296396396396397
key: train_accuracy
value: [0.99799398 0.99899699 0.99699097 0.99799398 0.99699097 0.99799398
0.99699097 0.98796389 0.998998 0.99699399]
mean value: 0.9967907731209661
key: test_fscore
value: [0.93220339 0.92982456 0.94642857 0.9122807 0.94915254 0.91525424
0.92307692 0.91666667 0.96491228 0.9380531 ]
mean value: 0.9327852971868469
key: train_fscore
value: [0.99799197 0.998999 0.997003 0.99799599 0.996997 0.99799197
0.99699097 0.98809524 0.998999 0.997003 ]
mean value: 0.9968067127741923
key: test_precision
value: [0.87301587 0.89830508 0.92982456 0.88135593 0.90322581 0.87096774
0.8852459 0.859375 0.93220339 0.9137931 ]
mean value: 0.894731239467376
key: train_precision
value: [1. 0.998 0.9940239 0.99799599 0.99401198 0.99799197
0.99599198 0.97647059 0.998 0.9940239 ]
mean value: 0.9946510316871529
key: test_recall
value: [1. 0.96363636 0.96363636 0.94545455 1. 0.96428571
0.96428571 0.98214286 1. 0.96363636]
mean value: 0.9747077922077922
key: train_recall
value: [0.99599198 1. 1. 0.99799599 1. 0.99799197
0.99799197 1. 1. 1. ]
mean value: 0.9989971911694876
key: test_roc_auc
value: [0.92857143 0.92824675 0.9461039 0.91022727 0.94545455 0.90941558
0.91850649 0.90925325 0.96363636 0.93636364]
mean value: 0.9295779220779221
key: train_roc_auc
value: [0.99799599 0.99899598 0.99698795 0.99799398 0.99699399 0.99799398
0.99699198 0.98797595 0.998998 0.99699399]
mean value: 0.9967921787349799
key: test_jcc
value: [0.87301587 0.86885246 0.89830508 0.83870968 0.90322581 0.84375
0.85714286 0.84615385 0.93220339 0.88333333]
mean value: 0.8744692327109542
key: train_jcc
value: [0.99599198 0.998 0.9940239 0.996 0.99401198 0.99599198
0.994 0.97647059 0.998 0.9940239 ]
mean value: 0.9936514340984011
MCC on Blind test: 0.58
Accuracy on Blind test: 0.83
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.06268549 0.04895139 0.04552436 0.04649806 0.05077744 0.04187751
0.04737687 0.04394555 0.04726696 0.0483892 ]
mean value: 0.048329281806945804
key: score_time
value: [0.0103085 0.01011729 0.00918198 0.01004171 0.00924993 0.00930953
0.00923204 0.00923681 0.00998807 0.00920033]
mean value: 0.009586620330810546
key: test_mcc
value: [0.96459895 0.80845318 0.91127765 0.93038564 0.81771432 0.89414155
0.856354 0.94608644 0.92973479 0.92973479]
mean value: 0.8988481309656139
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98198198 0.9009009 0.95495495 0.96396396 0.9009009 0.94594595
0.92792793 0.97297297 0.96363636 0.96363636]
mean value: 0.9476822276822277
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98214286 0.90598291 0.95575221 0.96491228 0.91056911 0.94827586
0.92982456 0.97345133 0.96491228 0.96491228]
mean value: 0.9500735674217566
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96491228 0.85483871 0.93103448 0.93220339 0.8358209 0.91666667
0.9137931 0.96491228 0.93220339 0.93220339]
mean value: 0.9178588588968405
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96363636 0.98181818 1. 1. 0.98214286
0.94642857 0.98214286 1. 1. ]
mean value: 0.9856168831168831
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.90146104 0.95519481 0.96428571 0.9 0.94561688
0.92775974 0.97288961 0.96363636 0.96363636]
mean value: 0.9476623376623377
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96491228 0.828125 0.91525424 0.93220339 0.8358209 0.90163934
0.86885246 0.94827586 0.93220339 0.93220339]
mean value: 0.9059490248351457
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.66
Accuracy on Blind test: 0.87
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.17697835 0.17841768 0.18904471 0.1832931 0.18118811 0.18383741
0.18178773 0.18106866 0.18363118 0.18622208]
mean value: 0.18254690170288085
key: score_time
value: [0.01878119 0.01896453 0.01970887 0.0195868 0.01971936 0.02070141
0.02028441 0.02025509 0.01984024 0.02044296]
mean value: 0.019828486442565917
key: test_mcc
value: [0.94735177 0.85816689 0.91127765 0.83912942 0.89704631 0.94608644
0.83793444 0.91003577 0.94686415 0.89149871]
mean value: 0.8985391553279924
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97297297 0.92792793 0.95495495 0.91891892 0.94594595 0.97297297
0.91891892 0.95495495 0.97272727 0.94545455]
mean value: 0.9485749385749386
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97345133 0.92982456 0.95575221 0.92035398 0.94915254 0.97345133
0.92035398 0.95575221 0.97345133 0.94642857]
mean value: 0.9497972046886377
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94827586 0.89830508 0.93103448 0.89655172 0.90322581 0.96491228
0.9122807 0.94736842 0.94827586 0.92982456]
mean value: 0.9280054787144139
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96363636 0.98181818 0.94545455 1. 0.98214286
0.92857143 0.96428571 1. 0.96363636]
mean value: 0.9729545454545454
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97321429 0.92824675 0.95519481 0.91915584 0.94545455 0.97288961
0.91883117 0.95487013 0.97272727 0.94545455]
mean value: 0.9486038961038961
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94827586 0.86885246 0.91525424 0.85245902 0.90322581 0.94827586
0.85245902 0.91525424 0.94827586 0.89830508]
mean value: 0.9050637443783822
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.5
Accuracy on Blind test: 0.81
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01411533 0.01404738 0.01408291 0.01340747 0.01328039 0.01412058
0.01400352 0.01396227 0.01360893 0.01377416]
mean value: 0.013840293884277344
key: score_time
value: [0.01007175 0.0093317 0.01000047 0.00991845 0.01003599 0.00996137
0.00995827 0.01003003 0.01001978 0.00936818]
mean value: 0.009869599342346191
key: test_mcc
value: [0.86504296 0.76054489 0.7763355 0.75592959 0.7964953 0.86075909
0.8049036 0.82447186 0.86373129 0.79507028]
mean value: 0.810328436196157
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92792793 0.87387387 0.88288288 0.87387387 0.89189189 0.92792793
0.9009009 0.90990991 0.92727273 0.89090909]
mean value: 0.9007371007371007
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93220339 0.88333333 0.8907563 0.88135593 0.90163934 0.93220339
0.90598291 0.91525424 0.93220339 0.9 ]
mean value: 0.9074932225082594
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.87301587 0.81538462 0.828125 0.82539683 0.83333333 0.88709677
0.86885246 0.87096774 0.87301587 0.83076923]
mean value: 0.8505957726061176
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96363636 0.96363636 0.94545455 0.98214286 0.98214286
0.94642857 0.96428571 1. 0.98181818]
mean value: 0.9729545454545454
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92857143 0.87467532 0.8836039 0.87451299 0.89107143 0.92743506
0.90048701 0.90941558 0.92727273 0.89090909]
mean value: 0.9007954545454546
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.87301587 0.79104478 0.8030303 0.78787879 0.82089552 0.87301587
0.828125 0.84375 0.87301587 0.81818182]
mean value: 0.831195382664599
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.45
Accuracy on Blind test: 0.79
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.90355897 2.85356164 2.86990976 2.85194898 2.86117744 2.82591081
2.8085959 2.80563903 2.87003446 2.81331229]
mean value: 2.8463649272918703
key: score_time
value: [0.10372114 0.09917402 0.10634851 0.10654712 0.10809731 0.10232544
0.10662699 0.10796857 0.09895921 0.15853858]
mean value: 0.10983068943023681
key: test_mcc
value: [0.94735177 0.89427626 0.9461039 0.94735177 0.88077101 0.88077101
0.87508299 0.92850223 0.96427411 0.94686415]
mean value: 0.9211349204607814
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97297297 0.94594595 0.97297297 0.97297297 0.93693694 0.93693694
0.93693694 0.96396396 0.98181818 0.97272727]
mean value: 0.9594185094185095
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97345133 0.94736842 0.97297297 0.97345133 0.94117647 0.94117647
0.93913043 0.96491228 0.98214286 0.97345133]
mean value: 0.960923389013018
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94827586 0.91525424 0.96428571 0.94827586 0.88888889 0.88888889
0.91525424 0.94827586 0.96491228 0.94827586]
mean value: 0.9330587695617379
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.98181818 0.98181818 1. 1. 1.
0.96428571 0.98214286 1. 1. ]
mean value: 0.9910064935064935
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97321429 0.94626623 0.97305195 0.97321429 0.93636364 0.93636364
0.93668831 0.9637987 0.98181818 0.97272727]
mean value: 0.9593506493506494
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94827586 0.9 0.94736842 0.94827586 0.88888889 0.88888889
0.8852459 0.93220339 0.96491228 0.94827586]
mean value: 0.9252335357208913
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.89
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.18460464 1.13488936 1.1510613 1.13592005 1.1451478 1.14192295
1.12877488 1.1883049 1.133883 1.14420176]
mean value: 1.1488710641860962
key: score_time
value: [0.23078775 0.26031542 0.23725033 0.27050185 0.21965718 0.2571454
0.18469596 0.23342299 0.19362545 0.28665137]
mean value: 0.2374053716659546
key: test_mcc
value: [0.91127765 0.83912942 0.91003577 0.89188312 0.88077101 0.86075909
0.91355091 0.91003577 0.94561086 0.92973479]
mean value: 0.899278839053012
key: train_mcc
value: [0.96427099 0.95213028 0.95624352 0.95819837 0.95441822 0.95017394
0.95810779 0.95429502 0.95628827 0.95618835]
mean value: 0.9560314751624464
key: test_accuracy
value: [0.95495495 0.91891892 0.95495495 0.94594595 0.93693694 0.92792793
0.95495495 0.95495495 0.97272727 0.96363636]
mean value: 0.9485913185913186
key: train_accuracy
value: [0.98194584 0.97592778 0.9779338 0.97893681 0.97693079 0.97492477
0.97893681 0.97693079 0.97795591 0.97795591]
mean value: 0.9778379225853915
key: test_fscore
value: [0.95575221 0.92035398 0.95412844 0.94545455 0.94117647 0.93220339
0.95726496 0.95575221 0.97297297 0.96491228]
mean value: 0.9499971464259592
key: train_fscore
value: [0.98221344 0.97623762 0.97826087 0.97922849 0.97729516 0.97522299
0.97914598 0.97725025 0.97826087 0.97821782]
mean value: 0.9781333491434867
key: test_precision
value: [0.93103448 0.89655172 0.96296296 0.94545455 0.88888889 0.88709677
0.91803279 0.94736842 0.96428571 0.93220339]
mean value: 0.9273879690450597
key: train_precision
value: [0.96881092 0.96477495 0.96491228 0.96679688 0.96116505 0.962818
0.96856582 0.96296296 0.96491228 0.9667319 ]
mean value: 0.9652451032642626
key: test_recall
value: [0.98181818 0.94545455 0.94545455 0.94545455 1. 0.98214286
1. 0.96428571 0.98181818 1. ]
mean value: 0.9746428571428571
key: train_recall
value: [0.99599198 0.98797595 0.99198397 0.99198397 0.9939759 0.98795181
0.98995984 0.99196787 0.99198397 0.98997996]
mean value: 0.9913755221285945
key: test_roc_auc
value: [0.95519481 0.91915584 0.95487013 0.94594156 0.93636364 0.92743506
0.95454545 0.95487013 0.97272727 0.96363636]
mean value: 0.948474025974026
key: train_roc_auc
value: [0.98193173 0.97591569 0.97791969 0.97892371 0.97694787 0.97493783
0.97894786 0.97694586 0.97795591 0.97795591]
mean value: 0.977838206533549
key: test_jcc
value: [0.91525424 0.85245902 0.9122807 0.89655172 0.88888889 0.87301587
0.91803279 0.91525424 0.94736842 0.93220339]
mean value: 0.9051309276535179
key: train_jcc
value: [0.96504854 0.95357834 0.95744681 0.95930233 0.95559846 0.9516441
0.95914397 0.95551257 0.95744681 0.95736434]
mean value: 0.9572086261518494
MCC on Blind test: 0.78
Accuracy on Blind test: 0.91
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02212071 0.01712632 0.017241 0.01725721 0.03649306 0.01726389
0.01711011 0.01731777 0.01719499 0.01736307]
mean value: 0.019648814201354982
key: score_time
value: [0.01253033 0.01254916 0.01253748 0.01241183 0.01262856 0.01252508
0.01236796 0.01235127 0.01247168 0.01250958]
mean value: 0.012488293647766113
key: test_mcc
value: [0.53168696 0.53168696 0.5365027 0.60383519 0.64303575 0.6048892
0.65842676 0.60383519 0.71097366 0.62075223]
mean value: 0.6045624604926498
key: train_mcc
value: [0.62690345 0.63122939 0.61741439 0.64894709 0.632912 0.6491123
0.62326104 0.64713651 0.64529188 0.60949387]
mean value: 0.6331701922310734
key: test_accuracy
value: [0.76576577 0.76576577 0.76576577 0.8018018 0.81981982 0.8018018
0.82882883 0.8018018 0.85454545 0.80909091]
mean value: 0.8014987714987715
key: train_accuracy
value: [0.81344032 0.81544634 0.80842528 0.82447342 0.81644935 0.82447342
0.8114343 0.82347041 0.82264529 0.80460922]
mean value: 0.8164867347533582
key: test_fscore
value: [0.75925926 0.75925926 0.74509804 0.7962963 0.83050847 0.81034483
0.83478261 0.80701754 0.85964912 0.8 ]
mean value: 0.8002215431555298
key: train_fscore
value: [0.81287726 0.81262729 0.80450358 0.82482482 0.81681682 0.82621648
0.80777096 0.82539683 0.8224674 0.80162767]
mean value: 0.815512912261371
key: test_precision
value: [0.77358491 0.77358491 0.80851064 0.81132075 0.79032258 0.78333333
0.81355932 0.79310345 0.83050847 0.84 ]
mean value: 0.8017828363200135
key: train_precision
value: [0.81616162 0.82608696 0.82217573 0.824 0.81437126 0.8172888
0.82291667 0.81568627 0.82329317 0.81404959]
mean value: 0.819603006460176
key: test_recall
value: [0.74545455 0.74545455 0.69090909 0.78181818 0.875 0.83928571
0.85714286 0.82142857 0.89090909 0.76363636]
mean value: 0.8011038961038961
key: train_recall
value: [0.80961924 0.7995992 0.78757515 0.8256513 0.81927711 0.83534137
0.79317269 0.83534137 0.82164329 0.78957916]
mean value: 0.8116799864789821
key: test_roc_auc
value: [0.76558442 0.76558442 0.7650974 0.80162338 0.81931818 0.80146104
0.82857143 0.80162338 0.85454545 0.80909091]
mean value: 0.80125
key: train_roc_auc
value: [0.81344416 0.81546225 0.80844621 0.82447224 0.81645218 0.82448431
0.811416 0.82348231 0.82264529 0.80460922]
mean value: 0.8164914165680759
key: test_jcc
value: [0.6119403 0.6119403 0.59375 0.66153846 0.71014493 0.68115942
0.71641791 0.67647059 0.75384615 0.66666667]
mean value: 0.668387472557535
key: train_jcc
value: [0.68474576 0.68439108 0.67294521 0.70187394 0.69035533 0.70389171
0.67753002 0.7027027 0.69846678 0.66893039]
mean value: 0.6885832913576179
MCC on Blind test: 0.5
Accuracy on Blind test: 0.79
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.15152597 0.13697934 0.13371658 0.28721023 0.12940335 0.16948915
0.12985754 0.13179946 0.12889552 0.12940407]
mean value: 0.15282812118530273
key: score_time
value: [0.01140881 0.01162529 0.01171136 0.01240182 0.01138711 0.01143169
0.01129651 0.01137733 0.01139212 0.01139045]
mean value: 0.011542248725891113
key: test_mcc
value: [0.93038564 0.91368563 0.9461039 0.96459895 0.86471225 0.88077101
0.89414155 0.93029809 0.92973479 0.92973479]
mean value: 0.9184166595432225
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96396396 0.95495495 0.97297297 0.98198198 0.92792793 0.93693694
0.94594595 0.96396396 0.96363636 0.96363636]
mean value: 0.9575921375921376
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96491228 0.95652174 0.97297297 0.98214286 0.93333333 0.94117647
0.94827586 0.96551724 0.96491228 0.96491228]
mean value: 0.9594677318721373
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93220339 0.91666667 0.96428571 0.96491228 0.875 0.88888889
0.91666667 0.93333333 0.93220339 0.93220339]
mean value: 0.9256363720034549
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.98181818 1. 1. 1.
0.98214286 1. 1. 1. ]
mean value: 0.9963961038961039
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96428571 0.95535714 0.97305195 0.98214286 0.92727273 0.93636364
0.94561688 0.96363636 0.96363636 0.96363636]
mean value: 0.9575
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93220339 0.91666667 0.94736842 0.96491228 0.875 0.88888889
0.90163934 0.93333333 0.93220339 0.93220339]
mean value: 0.9224419104397095
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.73
Accuracy on Blind test: 0.89
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.06950426 0.11023021 0.08584619 0.06061864 0.09362602 0.07184792
0.06709003 0.10699701 0.08673477 0.05997014]
mean value: 0.08124651908874511
key: score_time
value: [0.01922679 0.02629018 0.01244116 0.01238298 0.01918435 0.0123589
0.01421976 0.0239265 0.01261425 0.01244736]
mean value: 0.016509222984313964
key: test_mcc
value: [0.58827674 0.78434561 0.80286425 0.66058982 0.73247207 0.66597107
0.80802876 0.67720229 0.65465367 0.74743385]
mean value: 0.7121838124804615
key: train_mcc
value: [0.82706632 0.81733708 0.82049509 0.82325845 0.81464593 0.83301414
0.80679361 0.80033567 0.82344592 0.8154123 ]
mean value: 0.818180451473204
key: test_accuracy
value: [0.79279279 0.89189189 0.9009009 0.82882883 0.86486486 0.82882883
0.9009009 0.83783784 0.82727273 0.87272727]
mean value: 0.8546846846846847
key: train_accuracy
value: [0.91273821 0.90772317 0.90872618 0.9107322 0.90672016 0.91574724
0.90270812 0.8996991 0.91082164 0.90681363]
mean value: 0.908242965369053
key: test_fscore
value: [0.8 0.89285714 0.89719626 0.83478261 0.87179487 0.84297521
0.90756303 0.84482759 0.82882883 0.87719298]
mean value: 0.859801851434343
key: train_fscore
value: [0.9154519 0.91085271 0.91258405 0.91367604 0.90909091 0.91812865
0.90536585 0.90196078 0.91367604 0.90979631]
mean value: 0.9110583263662413
key: test_precision
value: [0.76666667 0.87719298 0.92307692 0.8 0.83606557 0.78461538
0.85714286 0.81666667 0.82142857 0.84745763]
mean value: 0.8330313252942346
key: train_precision
value: [0.88867925 0.88180113 0.87638376 0.88533835 0.88571429 0.89204545
0.88045541 0.88122605 0.88533835 0.88157895]
mean value: 0.8838560975791193
key: test_recall
value: [0.83636364 0.90909091 0.87272727 0.87272727 0.91071429 0.91071429
0.96428571 0.875 0.83636364 0.90909091]
mean value: 0.8897077922077922
key: train_recall
value: [0.94388778 0.94188377 0.95190381 0.94388778 0.93373494 0.94578313
0.93172691 0.92369478 0.94388778 0.93987976]
mean value: 0.9400270420358791
key: test_roc_auc
value: [0.79318182 0.89204545 0.90064935 0.82922078 0.86444805 0.82808442
0.90032468 0.8375 0.82727273 0.87272727]
mean value: 0.8545454545454545
key: train_roc_auc
value: [0.91270694 0.90768887 0.90868283 0.91069891 0.90674723 0.91577734
0.9027372 0.89972314 0.91082164 0.90681363]
mean value: 0.908239772718127
key: test_jcc
value: [0.66666667 0.80645161 0.81355932 0.71641791 0.77272727 0.72857143
0.83076923 0.73134328 0.70769231 0.78125 ]
mean value: 0.7555449035393881
key: train_jcc
value: [0.84408602 0.83629893 0.83922261 0.84107143 0.83333333 0.84864865
0.82709447 0.82142857 0.84107143 0.83451957]
mean value: 0.8366775026391152
MCC on Blind test: 0.59
Accuracy on Blind test: 0.83
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01626134 0.01653409 0.01663375 0.01649666 0.01635027 0.01643872
0.01659584 0.01632452 0.01655459 0.01650786]
mean value: 0.016469764709472656
key: score_time
value: [0.01276565 0.01302242 0.01256537 0.01214433 0.01233745 0.01241207
0.0123868 0.01220846 0.01229262 0.012537 ]
mean value: 0.012467217445373536
key: test_mcc
value: [0.60383519 0.53417408 0.49641957 0.49561285 0.69373177 0.53199093
0.6576811 0.60409227 0.54626778 0.69378191]
mean value: 0.5857587444320642
key: train_mcc
value: [0.59486707 0.59700354 0.60695011 0.60321575 0.60327781 0.593109
0.59093327 0.60693429 0.60556194 0.57923276]
mean value: 0.5981085545854256
key: test_accuracy
value: [0.8018018 0.76576577 0.74774775 0.74774775 0.84684685 0.76576577
0.82882883 0.8018018 0.77272727 0.84545455]
mean value: 0.7924488124488125
key: train_accuracy
value: [0.79739218 0.79839519 0.80341023 0.80140421 0.80140421 0.79638917
0.79538616 0.80341023 0.80260521 0.78957916]
mean value: 0.7989375943461647
key: test_fscore
value: [0.7962963 0.75 0.73584906 0.74074074 0.84955752 0.76363636
0.83185841 0.8 0.77876106 0.83809524]
mean value: 0.7884794686522855
key: train_fscore
value: [0.7959596 0.79593909 0.80161943 0.79795918 0.79713115 0.79264556
0.79268293 0.80121704 0.79918451 0.78787879]
mean value: 0.796221726221148
key: test_precision
value: [0.81132075 0.79591837 0.76470588 0.75471698 0.84210526 0.77777778
0.8245614 0.81481481 0.75862069 0.88 ]
mean value: 0.8024541934463368
key: train_precision
value: [0.80244399 0.80658436 0.80981595 0.81288981 0.81380753 0.80665281
0.80246914 0.80942623 0.81327801 0.79429735]
mean value: 0.8071665181788477
key: test_recall
value: [0.78181818 0.70909091 0.70909091 0.72727273 0.85714286 0.75
0.83928571 0.78571429 0.8 0.8 ]
mean value: 0.7759415584415584
key: train_recall
value: [0.78957916 0.78557114 0.79358717 0.78356713 0.7811245 0.77911647
0.78313253 0.79317269 0.78557114 0.78156313]
mean value: 0.7855985062494467
key: test_roc_auc
value: [0.80162338 0.76525974 0.7474026 0.74756494 0.84675325 0.76590909
0.82873377 0.80194805 0.77272727 0.84545455]
mean value: 0.7923376623376623
key: train_roc_auc
value: [0.79740002 0.79840806 0.80342009 0.80142212 0.80138389 0.79637186
0.79537388 0.80339997 0.80260521 0.78957916]
mean value: 0.7989364270710095
key: test_jcc
value: [0.66153846 0.6 0.58208955 0.58823529 0.73846154 0.61764706
0.71212121 0.66666667 0.63768116 0.72131148]
mean value: 0.6525752418797988
key: train_jcc
value: [0.66107383 0.66104553 0.66891892 0.66383701 0.66269165 0.65651438
0.65656566 0.66835871 0.6655348 0.65 ]
mean value: 0.6614540497740491
MCC on Blind test: 0.48
Accuracy on Blind test: 0.78
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0310688 0.02964067 0.03017783 0.03573632 0.03960013 0.02891755
0.02455711 0.03152013 0.0423491 0.03097415]
mean value: 0.032454180717468264
key: score_time
value: [0.0123229 0.01248646 0.01263022 0.01230764 0.01214814 0.01245093
0.01247001 0.01256704 0.01229119 0.01259637]
mean value: 0.012427091598510742
key: test_mcc
value: [0.58827674 0.79230071 0.75530907 0.50764074 0.80188377 0.67564935
0.65445146 0.77224584 0.64051262 0.71334833]
mean value: 0.6901618622911714
key: train_mcc
value: [0.78427841 0.76899725 0.73866251 0.57031401 0.80658958 0.73431765
0.69871873 0.76303423 0.61951256 0.72764254]
mean value: 0.721206747763299
key: test_accuracy
value: [0.79279279 0.89189189 0.87387387 0.72972973 0.9009009 0.83783784
0.81981982 0.87387387 0.79090909 0.85454545]
mean value: 0.8366175266175266
key: train_accuracy
value: [0.88966901 0.88264794 0.86760281 0.75827482 0.90270812 0.86459378
0.84653962 0.87863591 0.78156313 0.86072144]
mean value: 0.8532956585186421
key: test_fscore
value: [0.8 0.89830508 0.8627451 0.65116279 0.90265487 0.83928571
0.83870968 0.88888889 0.82706767 0.84615385]
mean value: 0.8354973636660027
key: train_fscore
value: [0.89563567 0.88825215 0.86105263 0.69377382 0.8998968 0.85592316
0.85552408 0.88552507 0.81923715 0.85101822]
mean value: 0.8505838757358823
key: test_precision
value: [0.76666667 0.84126984 0.93617021 0.90322581 0.89473684 0.83928571
0.76470588 0.8 0.70512821 0.89795918]
mean value: 0.8349148354699671
key: train_precision
value: [0.85045045 0.84854015 0.90687361 0.94791667 0.92569002 0.91343964
0.80748663 0.8372093 0.69872702 0.91474654]
mean value: 0.8651080026739061
key: test_recall
value: [0.83636364 0.96363636 0.8 0.50909091 0.91071429 0.83928571
0.92857143 1. 1. 0.8 ]
mean value: 0.8587662337662337
key: train_recall
value: [0.94589178 0.93186373 0.81963928 0.54709419 0.87550201 0.80522088
0.90963855 0.93975904 0.98997996 0.79559118]
mean value: 0.8560180602168191
key: test_roc_auc
value: [0.79318182 0.89253247 0.87321429 0.72775974 0.90081169 0.83782468
0.81883117 0.87272727 0.79090909 0.85454545]
mean value: 0.8362337662337662
key: train_roc_auc
value: [0.88961256 0.88259853 0.86765096 0.75848685 0.90268086 0.86453429
0.84660284 0.87869715 0.78156313 0.86072144]
mean value: 0.853314862657041
key: test_jcc
value: [0.66666667 0.81538462 0.75862069 0.48275862 0.82258065 0.72307692
0.72222222 0.8 0.70512821 0.73333333]
mean value: 0.7229771921318083
key: train_jcc
value: [0.81099656 0.79896907 0.75600739 0.5311284 0.81801126 0.74813433
0.74752475 0.79456706 0.69382022 0.74067164]
mean value: 0.743983070132102
MCC on Blind test: 0.61
Accuracy on Blind test: 0.85
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03678513 0.04669952 0.03958678 0.0368154 0.04081035 0.04410028
0.03240681 0.03612375 0.04343247 0.0369277 ]
mean value: 0.039368820190429685
key: score_time
value: [0.0123539 0.01226425 0.01249218 0.01270509 0.01231241 0.01213574
0.0124712 0.01646805 0.01875472 0.01255918]
mean value: 0.013451671600341797
key: test_mcc
value: [0.57674936 0.27435929 0.4414112 0.61409543 0.84439989 0.71168831
0.78567192 0.44752365 0.7823356 0.73029674]
mean value: 0.6208531392970608
key: train_mcc
value: [0.61504983 0.35761898 0.45919707 0.70513165 0.81895676 0.80887671
0.8079072 0.49556275 0.81080701 0.82264753]
mean value: 0.6701755481213304
key: test_accuracy
value: [0.75675676 0.58558559 0.67567568 0.79279279 0.91891892 0.85585586
0.89189189 0.67567568 0.89090909 0.86363636]
mean value: 0.7907698607698608
key: train_accuracy
value: [0.77632899 0.61885657 0.68004012 0.84052156 0.90772317 0.90371113
0.90270812 0.70511535 0.90480962 0.90881764]
mean value: 0.8148632269554154
key: test_fscore
value: [0.8 0.3030303 0.52631579 0.75268817 0.92436975 0.85714286
0.89655172 0.53846154 0.88888889 0.86956522]
mean value: 0.7357014238468678
key: train_fscore
value: [0.81676253 0.39297125 0.53701016 0.81703107 0.91170825 0.90062112
0.90628019 0.58938547 0.90216272 0.91358025]
mean value: 0.768751301189569
key: test_precision
value: [0.675 0.90909091 0.95238095 0.92105263 0.87301587 0.85714286
0.86666667 0.95454545 0.90566038 0.83333333]
mean value: 0.8747889055113485
key: train_precision
value: [0.69220056 0.96850394 0.97368421 0.95945946 0.87316176 0.92948718
0.87337058 0.96788991 0.9279661 0.86823105]
mean value: 0.9033954742454171
key: test_recall
value: [0.98181818 0.18181818 0.36363636 0.63636364 0.98214286 0.85714286
0.92857143 0.375 0.87272727 0.90909091]
mean value: 0.7088311688311688
key: train_recall
value: [0.99599198 0.24649299 0.37074148 0.71142285 0.95381526 0.87349398
0.94176707 0.42369478 0.87775551 0.96392786]
mean value: 0.7359103749668011
key: test_roc_auc
value: [0.75876623 0.58198052 0.67288961 0.7913961 0.91834416 0.85584416
0.89155844 0.67840909 0.89090909 0.86363636]
mean value: 0.7903733766233766
key: train_roc_auc
value: [0.77610844 0.61923043 0.68035066 0.84065118 0.90776935 0.90368086
0.90274726 0.70483336 0.90480962 0.90881764]
mean value: 0.814899880081448
key: test_jcc
value: [0.66666667 0.17857143 0.35714286 0.60344828 0.859375 0.75
0.8125 0.36842105 0.8 0.76923077]
mean value: 0.616535605010537
key: train_jcc
value: [0.69027778 0.2445328 0.36706349 0.69066148 0.8377425 0.81920904
0.82862191 0.41782178 0.8217636 0.84090909]
mean value: 0.6558603479044525
MCC on Blind test: 0.63
Accuracy on Blind test: 0.84
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.30124474 0.28551722 0.28614354 0.28642416 0.2862277 0.28524876
0.28583145 0.28529334 0.28703308 0.28703952]
mean value: 0.28760035037994386
key: score_time
value: [0.01587796 0.01592803 0.01595163 0.01585817 0.01593447 0.01601052
0.01596475 0.01596737 0.01586986 0.0159719 ]
mean value: 0.015933465957641602
key: test_mcc
value: [0.82480596 0.85644694 0.856354 0.83798701 0.86471225 0.84111937
0.84111937 0.89704631 0.87402845 0.9104463 ]
mean value: 0.8604065967920353
key: train_mcc
value: [0.94009038 0.93823637 0.92852803 0.93240093 0.94057906 0.92815139
0.95000524 0.92424985 0.94621294 0.93734857]
mean value: 0.9365802768993494
key: test_accuracy
value: [0.90990991 0.92792793 0.92792793 0.91891892 0.92792793 0.91891892
0.91891892 0.94594595 0.93636364 0.95454545]
mean value: 0.9287305487305487
key: train_accuracy
value: [0.96990973 0.96890672 0.96389168 0.96589769 0.96990973 0.96389168
0.97492477 0.96188566 0.97294589 0.96793587]
mean value: 0.9680099416485931
key: test_fscore
value: [0.9137931 0.92857143 0.92592593 0.91891892 0.93333333 0.92307692
0.92307692 0.94915254 0.9380531 0.95575221]
mean value: 0.9309654408459123
key: train_fscore
value: [0.97029703 0.96939783 0.96463654 0.96653543 0.97047244 0.96435644
0.97512438 0.96245059 0.97329377 0.96881092]
mean value: 0.9685375365555099
key: test_precision
value: [0.86885246 0.9122807 0.94339623 0.91071429 0.875 0.8852459
0.8852459 0.90322581 0.9137931 0.93103448]
mean value: 0.9028788868837357
key: train_precision
value: [0.95890411 0.95525292 0.9460501 0.94970986 0.95173745 0.95117188
0.96646943 0.94747082 0.9609375 0.943074 ]
mean value: 0.9530778064480604
key: test_recall
value: [0.96363636 0.94545455 0.90909091 0.92727273 1. 0.96428571
0.96428571 1. 0.96363636 0.98181818]
mean value: 0.9619480519480519
key: train_recall
value: [0.98196393 0.98396794 0.98396794 0.98396794 0.98995984 0.97791165
0.98393574 0.97791165 0.98597194 0.99599198]
mean value: 0.9845550538828661
key: test_roc_auc
value: [0.91038961 0.92808442 0.92775974 0.91899351 0.92727273 0.91850649
0.91850649 0.94545455 0.93636364 0.95454545]
mean value: 0.9285876623376623
key: train_roc_auc
value: [0.96989763 0.9688916 0.96387152 0.96587955 0.96992982 0.96390572
0.9749338 0.96190172 0.97294589 0.96793587]
mean value: 0.9680093117962834
key: test_jcc
value: [0.84126984 0.86666667 0.86206897 0.85 0.875 0.85714286
0.85714286 0.90322581 0.88333333 0.91525424]
mean value: 0.8711104564812545
key: train_jcc
value: [0.94230769 0.94061303 0.9316888 0.9352381 0.94263862 0.93116635
0.95145631 0.92761905 0.94797688 0.93950851]
mean value: 0.9390213333766735
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.21933794 0.23009133 0.23146009 0.22986817 0.23455238 0.23486233
0.23559117 0.22639084 0.224648 0.22071886]
mean value: 0.22875211238861085
key: score_time
value: [0.03442264 0.04063106 0.02764821 0.03920436 0.03479433 0.04123449
0.03968191 0.03944087 0.02062654 0.03691602]
mean value: 0.035460042953491214
key: test_mcc
value: [0.86102173 0.86102173 0.94608644 0.91006494 0.83319558 0.89704631
0.89414155 0.89414155 0.92973479 0.91287093]
mean value: 0.893932555011654
key: train_mcc
value: [0.99198387 0.9939999 0.99197592 0.9900196 1. 0.99599599
0.99200792 0.99599599 0.99201584 1. ]
mean value: 0.9943995039269277
key: test_accuracy
value: [0.92792793 0.92792793 0.97297297 0.95495495 0.90990991 0.94594595
0.94594595 0.94594595 0.96363636 0.95454545]
mean value: 0.944971334971335
key: train_accuracy
value: [0.99598796 0.99699097 0.99598796 0.99498495 1. 0.99799398
0.99598796 0.99799398 0.99599198 1. ]
mean value: 0.9971919767317986
key: test_fscore
value: [0.93103448 0.93103448 0.97247706 0.95495495 0.91803279 0.94915254
0.94827586 0.94827586 0.96491228 0.95652174]
mean value: 0.9474672057920627
key: train_fscore
value: [0.996 0.997003 0.99599198 0.99501496 1. 0.99799599
0.996 0.99799599 0.99600798 1. ]
mean value: 0.9972009904105401
key: test_precision
value: [0.8852459 0.8852459 0.98148148 0.94642857 0.84848485 0.90322581
0.91666667 0.91666667 0.93220339 0.91666667]
mean value: 0.9132315900955711
key: train_precision
value: [0.99401198 0.9940239 0.99599198 0.99007937 1. 0.996
0.99203187 0.996 0.99204771 1. ]
mean value: 0.995018681570533
key: test_recall
value: [0.98181818 0.98181818 0.96363636 0.96363636 1. 1.
0.98214286 0.98214286 1. 1. ]
mean value: 0.9855194805194805
key: train_recall
value: [0.99799599 1. 0.99599198 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9993987975951903
key: test_roc_auc
value: [0.92840909 0.92840909 0.97288961 0.95503247 0.90909091 0.94545455
0.94561688 0.94561688 0.96363636 0.95454545]
mean value: 0.9448701298701299
key: train_roc_auc
value: [0.99598595 0.99698795 0.99598796 0.99497992 1. 0.99799599
0.99599198 0.99799599 0.99599198 1. ]
mean value: 0.9971917731044418
key: test_jcc
value: [0.87096774 0.87096774 0.94642857 0.9137931 0.84848485 0.90322581
0.90163934 0.90163934 0.93220339 0.91666667]
mean value: 0.9006016558706041
key: train_jcc
value: [0.99203187 0.9940239 0.99201597 0.99007937 1. 0.996
0.99203187 0.996 0.99204771 1. ]
mean value: 0.9944230696263322
MCC on Blind test: 0.68
Accuracy on Blind test: 0.87
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.45978689 0.58875775 0.49073029 0.46216607 0.47802901 0.42841339
0.42438769 0.4908936 0.44094849 0.39796805]
mean value: 0.4662081241607666
key: score_time
value: [0.02401781 0.04390836 0.04260111 0.04216743 0.02424955 0.04410362
0.04739928 0.04214454 0.02356482 0.0384655 ]
mean value: 0.0372622013092041
key: test_mcc
value: [0.83369955 0.76054489 0.77216596 0.73587873 0.75979502 0.81228039
0.75530907 0.76868784 0.86373129 0.82035423]
mean value: 0.788244696340273
key: train_mcc
value: [0.97408004 0.97211078 0.97014526 0.97408004 0.96613365 0.97204135
0.97219811 0.97006835 0.96821588 0.97009768]
mean value: 0.9709171117615186
key: test_accuracy
value: [0.90990991 0.87387387 0.88288288 0.86486486 0.87387387 0.9009009
0.87387387 0.88288288 0.92727273 0.90909091]
mean value: 0.8899426699426699
key: train_accuracy
value: [0.98696088 0.98595787 0.98495486 0.98696088 0.98294885 0.98595787
0.98595787 0.98495486 0.98396794 0.98496994]
mean value: 0.985359183763716
key: test_fscore
value: [0.91666667 0.88333333 0.88888889 0.87179487 0.8852459 0.90909091
0.88333333 0.88888889 0.93220339 0.9122807 ]
mean value: 0.8971726885221131
key: train_fscore
value: [0.98709037 0.98611111 0.9851338 0.98709037 0.98311817 0.98605578
0.98611111 0.98507463 0.98415842 0.98510427]
mean value: 0.9855048015415081
key: test_precision
value: [0.84615385 0.81538462 0.83870968 0.82258065 0.81818182 0.84615385
0.828125 0.85245902 0.87301587 0.88135593]
mean value: 0.8422120270067477
key: train_precision
value: [0.97834646 0.97642436 0.9745098 0.97834646 0.97249509 0.97826087
0.9745098 0.97633136 0.97260274 0.97637795]
mean value: 0.9758204894124628
key: test_recall
value: [1. 0.96363636 0.94545455 0.92727273 0.96428571 0.98214286
0.94642857 0.92857143 1. 0.94545455]
mean value: 0.9603246753246754
key: train_recall
value: [0.99599198 0.99599198 0.99599198 0.99599198 0.9939759 0.9939759
0.99799197 0.9939759 0.99599198 0.99398798]
mean value: 0.9953867574506443
key: test_roc_auc
value: [0.91071429 0.87467532 0.88344156 0.86542208 0.87305195 0.90016234
0.87321429 0.88246753 0.92727273 0.90909091]
mean value: 0.8899512987012987
key: train_roc_auc
value: [0.98695182 0.9859478 0.98494378 0.98695182 0.9829599 0.98596591
0.98596993 0.9849639 0.98396794 0.98496994]
mean value: 0.9853592727623922
key: test_jcc
value: [0.84615385 0.79104478 0.8 0.77272727 0.79411765 0.83333333
0.79104478 0.8 0.87301587 0.83870968]
mean value: 0.814014720194731
key: train_jcc
value: [0.9745098 0.97260274 0.97070312 0.9745098 0.96679688 0.97249509
0.97260274 0.97058824 0.96881092 0.97064579]
mean value: 0.9714265119740892
MCC on Blind test: 0.49
Accuracy on Blind test: 0.8
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.3042357 1.29457378 1.29446912 1.29047823 1.29637837 1.31723022
1.28884625 1.29490328 1.29829788 1.29580331]
mean value: 1.2975216150283813
key: score_time
value: [0.01003814 0.00967193 0.00972557 0.00951791 0.00977921 0.00974941
0.00971889 0.00971889 0.01020885 0.00968409]
mean value: 0.009781289100646972
key: test_mcc
value: [0.8972375 0.88102763 0.9461039 0.93038564 0.86471225 0.87733514
0.89414155 0.93029809 0.9104463 0.92973479]
mean value: 0.9061422782542579
key: train_mcc
value: [0.99200779 0.98605489 0.99200779 0.98407831 0.98803559 0.9900198
0.9900198 0.98605528 0.98206056 0.99002966]
mean value: 0.9880369474583409
key: test_accuracy
value: [0.94594595 0.93693694 0.97297297 0.96396396 0.92792793 0.93693694
0.94594595 0.96396396 0.95454545 0.96363636]
mean value: 0.9512776412776413
key: train_accuracy
value: [0.99598796 0.99297894 0.99598796 0.99197593 0.99398195 0.99498495
0.99498495 0.99297894 0.99098196 0.99498998]
mean value: 0.9939833528642038
key: test_fscore
value: [0.94827586 0.94017094 0.97297297 0.96491228 0.93333333 0.94017094
0.94827586 0.96551724 0.95575221 0.96491228]
mean value: 0.9534293925958317
key: train_fscore
value: [0.99600798 0.99303483 0.99600798 0.99204771 0.99401198 0.995005
0.995005 0.99302094 0.99104478 0.99501496]
mean value: 0.9940201142152542
key: test_precision
value: [0.90163934 0.88709677 0.96428571 0.93220339 0.875 0.90163934
0.91666667 0.93333333 0.93103448 0.93220339]
mean value: 0.917510243942349
key: train_precision
value: [0.99204771 0.98616601 0.99204771 0.98422091 0.98809524 0.99005964
0.99005964 0.98613861 0.98418972 0.99007937]
mean value: 0.9883104567288739
key: test_recall
value: [1. 1. 0.98181818 1. 1. 0.98214286
0.98214286 1. 0.98181818 1. ]
mean value: 0.9927922077922078
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.99799599 1. ]
mean value: 0.9997995991983968
key: test_roc_auc
value: [0.94642857 0.9375 0.97305195 0.96428571 0.92727273 0.93652597
0.94561688 0.96363636 0.95454545 0.96363636]
mean value: 0.95125
key: train_roc_auc
value: [0.99598394 0.99297189 0.99598394 0.99196787 0.99398798 0.99498998
0.99498998 0.99298597 0.99098196 0.99498998]
mean value: 0.9939833482225495
key: test_jcc
value: [0.90163934 0.88709677 0.94736842 0.93220339 0.875 0.88709677
0.90163934 0.93333333 0.91525424 0.93220339]
mean value: 0.9112835008246805
key: train_jcc
value: [0.99204771 0.98616601 0.99204771 0.98422091 0.98809524 0.99005964
0.99005964 0.98613861 0.98224852 0.99007937]
mean value: 0.988116336467864
MCC on Blind test: 0.77
Accuracy on Blind test: 0.91
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.04222107 0.04232812 0.04297519 0.04269028 0.04910517 0.05046797
0.05291557 0.04299235 0.04354048 0.0433166 ]
mean value: 0.045255279541015624
key: score_time
value: [0.01371765 0.01335597 0.01350236 0.0135746 0.01405025 0.01356721
0.03361487 0.01649952 0.0135181 0.01359224]
mean value: 0.015899276733398436
key: test_mcc
value: [0.36094911 0.32868787 0.33903271 0.34503278 0.35489665 0.3790812
0.39886202 0.42911451 0.33333333 0.42754614]
mean value: 0.36965363239191945
key: train_mcc
value: [0.38761246 0.38589628 0.38417641 0.37725875 0.49294912 0.37476691
0.39703819 0.38340652 0.37318816 0.38357064]
mean value: 0.3939863448644553
key: test_accuracy
value: [0.61261261 0.59459459 0.61261261 0.6036036 0.64864865 0.63963964
0.63963964 0.65765766 0.6 0.65454545]
mean value: 0.6263554463554464
key: train_accuracy
value: [0.63089268 0.62988967 0.62888666 0.62487462 0.70210632 0.62286861
0.63590772 0.62788365 0.62224449 0.62825651]
mean value: 0.6353810931793377
key: test_fscore
value: [0.71895425 0.70967742 0.71523179 0.71428571 0.72727273 0.73333333
0.73684211 0.74666667 0.71428571 0.74324324]
mean value: 0.7259792960150879
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
train_fscore
value: [0.73060029 0.73006584 0.72953216 0.72740525 0.76814988 0.72594752
0.73289183 0.72860278 0.72581818 0.72899927]
mean value: 0.7328013010149701
key: test_precision
value: [0.56122449 0.55 0.5625 0.55555556 0.59770115 0.58510638
0.58333333 0.59574468 0.55555556 0.59139785]
mean value: 0.5738118996957803
key: train_precision
value: [0.57554787 0.57488479 0.57422325 0.57159221 0.62835249 0.56979405
0.57839721 0.5730725 0.5696347 0.57356322]
mean value: 0.5789062286727364
key: test_recall
value: [1. 1. 0.98181818 1. 0.92857143 0.98214286
1. 1. 1. 1. ]
mean value: 0.9892532467532468
key: train_recall
value: [1. 1. 1. 1. 0.98795181 1.
1. 1. 1. 1. ]
mean value: 0.9987951807228915
key: test_roc_auc
value: [0.61607143 0.59821429 0.61590909 0.60714286 0.6461039 0.63652597
0.63636364 0.65454545 0.6 0.65454545]
mean value: 0.6265422077922078
key: train_roc_auc
value: [0.63052209 0.62951807 0.62851406 0.62449799 0.70239274 0.62324649
0.63627255 0.62825651 0.62224449 0.62825651]
mean value: 0.6353721499223346
key: test_jcc
value: [0.56122449 0.55 0.55670103 0.55555556 0.57142857 0.57894737
0.58333333 0.59574468 0.55555556 0.59139785]
mean value: 0.5699888435331252
key: train_jcc
value: [0.57554787 0.57488479 0.57422325 0.57159221 0.62357414 0.56979405
0.57839721 0.5730725 0.5696347 0.57356322]
mean value: 0.57842839407926
MCC on Blind test: 0.21
Accuracy on Blind test: 0.51
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02591252 0.02014327 0.01982069 0.04004431 0.04341722 0.04567528
0.02161407 0.01908636 0.019238 0.03228545]
mean value: 0.028723716735839844
key: score_time
value: [0.01445341 0.01340151 0.01611209 0.01954746 0.01933146 0.01936555
0.0124433 0.01254368 0.01254106 0.01970911]
mean value: 0.015944862365722658
key: test_mcc
value: [0.67762003 0.805216 0.7658331 0.69959151 0.8049036 0.68237361
0.76868784 0.71884134 0.74743385 0.78181818]
mean value: 0.7452319057416088
key: train_mcc
value: [0.79490575 0.80171572 0.81303246 0.79701682 0.81334074 0.80086148
0.79895075 0.79369889 0.79388846 0.7851833 ]
mean value: 0.7992594371416237
key: test_accuracy
value: [0.83783784 0.9009009 0.88288288 0.84684685 0.9009009 0.83783784
0.88288288 0.85585586 0.87272727 0.89090909]
mean value: 0.870958230958231
key: train_accuracy
value: [0.89669007 0.8996991 0.90471414 0.89769308 0.90571715 0.8996991
0.89869609 0.89568706 0.89579158 0.89178357]
mean value: 0.8986170937662687
key: test_fscore
value: [0.84210526 0.90434783 0.88073394 0.85470085 0.90598291 0.85
0.88888889 0.86666667 0.87719298 0.89090909]
mean value: 0.8761528423803527
key: train_fscore
value: [0.89990282 0.9034749 0.90909091 0.90097087 0.90873786 0.90253411
0.90165531 0.89941973 0.8996139 0.89514563]
mean value: 0.9020546048367907
key: test_precision
value: [0.81355932 0.86666667 0.88888889 0.80645161 0.86885246 0.796875
0.85245902 0.8125 0.84745763 0.89090909]
mean value: 0.844461968393025
key: train_precision
value: [0.87358491 0.87150838 0.86996337 0.87382298 0.87969925 0.87689394
0.87523629 0.86753731 0.86778399 0.86817326]
mean value: 0.87242036699792
key: test_recall
value: [0.87272727 0.94545455 0.87272727 0.90909091 0.94642857 0.91071429
0.92857143 0.92857143 0.90909091 0.89090909]
mean value: 0.9114285714285714
key: train_recall
value: [0.92785571 0.93787575 0.95190381 0.92985972 0.93975904 0.92971888
0.92971888 0.93373494 0.93386774 0.9238477 ]
mean value: 0.9338142147749314
key: test_roc_auc
value: [0.83814935 0.9012987 0.88279221 0.8474026 0.90048701 0.83717532
0.88246753 0.85519481 0.87272727 0.89090909]
mean value: 0.8708603896103897
key: train_roc_auc
value: [0.89665878 0.89966077 0.90466676 0.89766078 0.90575126 0.89972918
0.89872717 0.89572519 0.89579158 0.89178357]
mean value: 0.8986155041005707
key: test_jcc
value: [0.72727273 0.82539683 0.78688525 0.74626866 0.828125 0.73913043
0.8 0.76470588 0.78125 0.80327869]
mean value: 0.780231346094775
key: train_jcc
value: [0.8180212 0.82394366 0.83333333 0.81978799 0.83274021 0.82238011
0.82092199 0.8172232 0.81754386 0.81019332]
mean value: 0.8216088868355006
MCC on Blind test: 0.59
Accuracy on Blind test: 0.82
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:156: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:159: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.24418879 0.39535999 0.3969667 0.36512113 0.47421598 0.27459502
0.34953403 0.35271382 0.39017797 0.22438264]
mean value: 0.346725606918335
key: score_time
value: [0.01262951 0.02248549 0.01937032 0.01953912 0.01949883 0.01244497
0.01936412 0.01947474 0.02145219 0.02333951]
mean value: 0.018959879875183105
key: test_mcc
value: [0.67762003 0.76590909 0.7658331 0.69959151 0.8049036 0.68237361
0.76868784 0.71884134 0.74743385 0.78181818]
mean value: 0.7413012150181251
key: train_mcc
value: [0.79490575 0.80677119 0.81303246 0.79701682 0.81334074 0.80086148
0.79895075 0.79369889 0.79388846 0.7851833 ]
mean value: 0.7997649837559726
key: test_accuracy
value: [0.83783784 0.88288288 0.88288288 0.84684685 0.9009009 0.83783784
0.88288288 0.85585586 0.87272727 0.89090909]
mean value: 0.8691564291564291
key: train_accuracy
value: [0.89669007 0.90270812 0.90471414 0.89769308 0.90571715 0.8996991
0.89869609 0.89568706 0.89579158 0.89178357]
mean value: 0.8989179964743931
key: test_fscore
value: [0.84210526 0.88288288 0.88073394 0.85470085 0.90598291 0.85
0.88888889 0.86666667 0.87719298 0.89090909]
mean value: 0.8740063480599454
key: train_fscore
value: [0.89990282 0.90555015 0.90909091 0.90097087 0.90873786 0.90253411
0.90165531 0.89941973 0.8996139 0.89514563]
mean value: 0.9022621290949477
key: test_precision
value: [0.81355932 0.875 0.88888889 0.80645161 0.86885246 0.796875
0.85245902 0.8125 0.84745763 0.89090909]
mean value: 0.8452953017263584
key: train_precision
value: [0.87358491 0.88068182 0.86996337 0.87382298 0.87969925 0.87689394
0.87523629 0.86753731 0.86778399 0.86817326]
mean value: 0.873337710827275
key: test_recall
value: [0.87272727 0.89090909 0.87272727 0.90909091 0.94642857 0.91071429
0.92857143 0.92857143 0.90909091 0.89090909]
mean value: 0.905974025974026
key: train_recall
value: [0.92785571 0.93186373 0.95190381 0.92985972 0.93975904 0.92971888
0.92971888 0.93373494 0.93386774 0.9238477 ]
mean value: 0.9332130123701218
key: test_roc_auc
value: [0.83814935 0.88295455 0.88279221 0.8474026 0.90048701 0.83717532
0.88246753 0.85519481 0.87272727 0.89090909]
mean value: 0.869025974025974
key: train_roc_auc
value: [0.89665878 0.90267885 0.90466676 0.89766078 0.90575126 0.89972918
0.89872717 0.89572519 0.89579158 0.89178357]
mean value: 0.89891731253672
key: test_jcc
value: [0.72727273 0.79032258 0.78688525 0.74626866 0.828125 0.73913043
0.8 0.76470588 0.78125 0.80327869]
mean value: 0.7767239216196086
key: train_jcc
value: [0.8180212 0.82740214 0.83333333 0.81978799 0.83274021 0.82238011
0.82092199 0.8172232 0.81754386 0.81019332]
mean value: 0.8219547341614492
MCC on Blind test: 0.59
Accuracy on Blind test: 0.82
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04029536 0.04114866 0.03460741 0.02869368 0.06265116 0.03562665
0.03452373 0.04251862 0.0342226 0.02918482]
mean value: 0.038347268104553224
key: score_time
value: [0.01190257 0.01195717 0.01280403 0.01180339 0.01205182 0.01489973
0.01278424 0.01291847 0.01207256 0.01189375]
mean value: 0.012508773803710937
key: test_mcc
value: [0.95227002 0.80907152 0.7098505 0.75714286 0.76500781 0.70714286
0.8047619 0.67700771 0.40824829 0.8510645 ]
mean value: 0.7441567958533553
key: train_mcc
value: [0.82637697 0.80399267 0.82045348 0.80983264 0.82641807 0.8095776
0.8204801 0.84259028 0.8804868 0.80453796]
mean value: 0.8244746575544734
key: test_accuracy
value: [0.97560976 0.90243902 0.85365854 0.87804878 0.87804878 0.85365854
0.90243902 0.82926829 0.7 0.925 ]
mean value: 0.8698170731707318
key: train_accuracy
value: [0.91280654 0.90190736 0.91008174 0.90463215 0.91280654 0.90463215
0.91008174 0.92098093 0.94021739 0.90217391]
mean value: 0.9120320459661178
key: test_fscore
value: [0.97435897 0.9047619 0.84210526 0.87804878 0.87179487 0.85714286
0.9047619 0.85106383 0.72727273 0.92682927]
mean value: 0.8738140381818856
key: train_fscore
value: [0.91489362 0.90322581 0.91152815 0.90666667 0.9144385 0.90566038
0.91105121 0.92225201 0.94054054 0.90322581]
mean value: 0.9133482690959911
key: test_precision
value: [1. 0.86363636 0.88888889 0.85714286 0.94444444 0.85714286
0.9047619 0.76923077 0.66666667 0.9047619 ]
mean value: 0.8656676656676656
key: train_precision
value: [0.89583333 0.89361702 0.8994709 0.89005236 0.89528796 0.89361702
0.89893617 0.90526316 0.93548387 0.89361702]
mean value: 0.900117880984539
key: test_recall
value: [0.95 0.95 0.8 0.9 0.80952381 0.85714286
0.9047619 0.95238095 0.8 0.95 ]
mean value: 0.8873809523809524
key: train_recall
value: [0.93478261 0.91304348 0.92391304 0.92391304 0.93442623 0.91803279
0.92349727 0.93989071 0.94565217 0.91304348]
mean value: 0.9270194820622476
key: test_roc_auc
value: [0.975 0.90357143 0.85238095 0.87857143 0.8797619 0.85357143
0.90238095 0.82619048 0.7 0.925 ]
mean value: 0.8696428571428572
key: train_roc_auc
value: [0.9127465 0.90187693 0.91004395 0.90457947 0.91286529 0.90466857
0.9101182 0.92103231 0.94021739 0.90217391]
mean value: 0.9120322523164647
key: test_jcc
value: [0.95 0.82608696 0.72727273 0.7826087 0.77272727 0.75
0.82608696 0.74074074 0.57142857 0.86363636]
mean value: 0.7810588284501327
key: train_jcc
value: [0.84313725 0.82352941 0.83743842 0.82926829 0.84236453 0.82758621
0.83663366 0.85572139 0.8877551 0.82352941]
mean value: 0.8406963692117855
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.78683686 0.84267449 0.76527405 0.79202271 0.88664556 0.84797692
0.83675766 0.7930088 0.73645806 0.82136631]
mean value: 0.8109021425247193
key: score_time
value: [0.01321769 0.01195598 0.01218224 0.01203346 0.01193094 0.0118413
0.0120554 0.01197004 0.01200318 0.01233721]
mean value: 0.012152743339538575
key: test_mcc
value: [0.90238095 0.90692382 0.66432098 0.76500781 0.76500781 0.61152662
0.8047619 0.7197263 0.40824829 0.95118973]
mean value: 0.7499094216289688
key: train_mcc
value: [0.84255766 0.78218766 0.84206619 0.78774111 0.84228511 0.77715388
0.82581835 0.86440241 0.82628223 0.82652651]
mean value: 0.8217021104343651
key: test_accuracy
value: [0.95121951 0.95121951 0.82926829 0.87804878 0.87804878 0.80487805
0.90243902 0.85365854 0.7 0.975 ]
mean value: 0.8723780487804877
key: train_accuracy
value: [0.92098093 0.89100817 0.92098093 0.89373297 0.92098093 0.88828338
0.91280654 0.93188011 0.91304348 0.91304348]
mean value: 0.9106740907475418
key: test_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.95 0.95238095 0.81081081 0.88372093 0.87179487 0.81818182
0.9047619 0.86956522 0.72727273 0.97560976]
mean value: 0.8764098988924509
key: train_fscore
value: [0.92266667 0.89247312 0.92183288 0.89544236 0.92183288 0.89008043
0.91351351 0.93297587 0.91397849 0.9144385 ]
mean value: 0.9119234723468699
key: test_precision
value: [0.95 0.90909091 0.88235294 0.82608696 0.94444444 0.7826087
0.9047619 0.8 0.66666667 0.95238095]
mean value: 0.861839347069526
key: train_precision
value: [0.90575916 0.88297872 0.9144385 0.88359788 0.90957447 0.87368421
0.90374332 0.91578947 0.90425532 0.9 ]
mean value: 0.8993821058932191
key: test_recall
value: [0.95 1. 0.75 0.95 0.80952381 0.85714286
0.9047619 0.95238095 0.8 1. ]
mean value: 0.8973809523809524
key: train_recall
value: [0.94021739 0.90217391 0.92934783 0.9076087 0.93442623 0.90710383
0.92349727 0.95081967 0.92391304 0.92934783]
mean value: 0.9248455690187694
key: test_roc_auc
value: [0.95119048 0.95238095 0.82738095 0.8797619 0.8797619 0.80357143
0.90238095 0.85119048 0.7 0.975 ]
mean value: 0.8722619047619047
key: train_roc_auc
value: [0.92092837 0.89097767 0.92095807 0.89369506 0.92101746 0.88833452
0.91283559 0.93193158 0.91304348 0.91304348]
mean value: 0.910676526490853
key: test_jcc
value: [0.9047619 0.90909091 0.68181818 0.79166667 0.77272727 0.69230769
0.82608696 0.76923077 0.57142857 0.95238095]
mean value: 0.7871499876934659
key: train_jcc
value: [0.85643564 0.80582524 0.855 0.81067961 0.855 0.80193237
0.84079602 0.87437186 0.84158416 0.84236453]
mean value: 0.8383989434715573
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01398969 0.01157308 0.00995636 0.00958562 0.00952148 0.00955033
0.00955057 0.00975442 0.01003146 0.00946712]
mean value: 0.010298013687133789
key: score_time
value: [0.01204395 0.00927138 0.00908518 0.00879693 0.0088644 0.00872564
0.00873661 0.00880122 0.0087173 0.00862312]
mean value: 0.009166574478149414
key: test_mcc
value: [0.66432098 0.47003614 0.80817439 0.51190476 0.53864117 0.51551459
0.49692935 0.4373371 0.10206207 0.80403025]
mean value: 0.5348950810033143
key: train_mcc
value: [0.53255474 0.53298192 0.57267693 0.58005934 0.55338634 0.55388717
0.57083084 0.55875455 0.62088704 0.56794151]
mean value: 0.5643960359167794
key: test_accuracy
value: [0.82926829 0.73170732 0.90243902 0.75609756 0.75609756 0.75609756
0.73170732 0.70731707 0.55 0.9 ]
mean value: 0.7620731707317073
key: train_accuracy
value: [0.76566757 0.76566757 0.78474114 0.78746594 0.77656676 0.77656676
0.78474114 0.77929155 0.80978261 0.7826087 ]
mean value: 0.7813099751214311
key: test_fscore
value: [0.81081081 0.74418605 0.89473684 0.75 0.72222222 0.75
0.68571429 0.66666667 0.59090909 0.9047619 ]
mean value: 0.7520007869701872
key: train_fscore
value: [0.75842697 0.75706215 0.77363897 0.77325581 0.77222222 0.76966292
0.77620397 0.77562327 0.81578947 0.77142857]
mean value: 0.77433143190067
key: test_precision
value: [0.88235294 0.69565217 0.94444444 0.75 0.86666667 0.78947368
0.85714286 0.8 0.54166667 0.86363636]
mean value: 0.7991035797857039
key: train_precision
value: [0.78488372 0.78823529 0.81818182 0.83125 0.78531073 0.79190751
0.80588235 0.78651685 0.79081633 0.81325301]
mean value: 0.7996237627596408
key: test_recall
value: [0.75 0.8 0.85 0.75 0.61904762 0.71428571
0.57142857 0.57142857 0.65 0.95 ]
mean value: 0.7226190476190476
key: train_recall
value: [0.73369565 0.72826087 0.73369565 0.72282609 0.75956284 0.74863388
0.74863388 0.76502732 0.8423913 0.73369565]
mean value: 0.7516423140888572
key: test_roc_auc
value: [0.82738095 0.73333333 0.90119048 0.75595238 0.75952381 0.75714286
0.73571429 0.71071429 0.55 0.9 ]
mean value: 0.7630952380952382
key: train_roc_auc
value: [0.76575493 0.76576978 0.78488061 0.78764255 0.77652055 0.77649085
0.78464303 0.77925279 0.80978261 0.7826087 ]
mean value: 0.7813346400570207
key: test_jcc
value: [0.68181818 0.59259259 0.80952381 0.6 0.56521739 0.6
0.52173913 0.5 0.41935484 0.82608696]
mean value: 0.611633290090513
key: train_jcc
value: [0.61085973 0.60909091 0.63084112 0.63033175 0.62895928 0.62557078
0.63425926 0.63348416 0.68888889 0.62790698]
mean value: 0.6320192852709595
MCC on Blind test: 0.43
Accuracy on Blind test: 0.76
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01049089 0.01060438 0.01087594 0.01089048 0.01076055 0.01004434
0.01068139 0.01085758 0.0104301 0.01090169]
mean value: 0.01065373420715332
key: score_time
value: [0.00968719 0.00909781 0.00950837 0.0095396 0.00881219 0.00950313
0.00953436 0.00951672 0.00924182 0.00943446]
mean value: 0.009387564659118653
key: test_mcc
value: [0.7633652 0.75714286 0.7633652 0.56086079 0.49692935 0.6133669
0.71121921 0.56086079 0. 0.7 ]
mean value: 0.5927110280741841
key: train_mcc
value: [0.68990534 0.61522034 0.71127744 0.70268762 0.67492212 0.70148323
0.69678894 0.69625716 0.70409854 0.69104454]
mean value: 0.6883685277058793
key: test_accuracy
value: [0.87804878 0.87804878 0.87804878 0.7804878 0.73170732 0.80487805
0.85365854 0.7804878 0.5 0.85 ]
mean value: 0.7935365853658537
key: train_accuracy
value: [0.84468665 0.80653951 0.85558583 0.85013624 0.83651226 0.85013624
0.84741144 0.84741144 0.85054348 0.8451087 ]
mean value: 0.8434071792441654
key: test_fscore
value: [0.86486486 0.87804878 0.86486486 0.76923077 0.68571429 0.8
0.85 0.79069767 0.47368421 0.85 ]
mean value: 0.782710545010751
key: train_fscore
value: [0.84210526 0.79886686 0.85479452 0.84419263 0.82954545 0.84507042
0.84090909 0.84180791 0.84330484 0.84122563]
mean value: 0.8381822621430892
key: test_precision
value: [0.94117647 0.85714286 0.94117647 0.78947368 0.85714286 0.84210526
0.89473684 0.77272727 0.5 0.85 ]
mean value: 0.8245681717663141
key: train_precision
value: [0.85875706 0.83431953 0.86187845 0.8816568 0.86390533 0.87209302
0.87573964 0.87134503 0.88622754 0.86285714]
mean value: 0.8668779557223617
key: test_recall
value: [0.8 0.9 0.8 0.75 0.57142857 0.76190476
0.80952381 0.80952381 0.45 0.85 ]
mean value: 0.7502380952380953
key: train_recall
value: [0.82608696 0.76630435 0.84782609 0.80978261 0.79781421 0.81967213
0.80874317 0.81420765 0.80434783 0.82065217]
mean value: 0.8115437158469946
key: test_roc_auc
value: [0.87619048 0.87857143 0.87619048 0.7797619 0.73571429 0.80595238
0.8547619 0.7797619 0.5 0.85 ]
mean value: 0.7936904761904762
key: train_roc_auc
value: [0.84473747 0.80664944 0.85560703 0.8502465 0.8364071 0.85005346
0.84730637 0.84732122 0.85054348 0.8451087 ]
mean value: 0.8433980755523878
key: test_jcc
value: [0.76190476 0.7826087 0.76190476 0.625 0.52173913 0.66666667
0.73913043 0.65384615 0.31034483 0.73913043]
mean value: 0.6562275867560725
key: train_jcc
value: [0.72727273 0.66509434 0.74641148 0.73039216 0.70873786 0.73170732
0.7254902 0.72682927 0.72906404 0.72596154]
mean value: 0.7216960930404063
MCC on Blind test: 0.54
Accuracy on Blind test: 0.8
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01083994 0.01040053 0.01038337 0.00928807 0.01022983 0.01036191
0.01036453 0.01052999 0.0103054 0.00929523]
mean value: 0.010199880599975586
key: score_time
value: [0.01751852 0.01245236 0.01286888 0.01144576 0.01214862 0.0126729
0.01281142 0.01234031 0.01376772 0.01659131]
mean value: 0.013461780548095704
key: test_mcc
value: [ 0.47003614 0.56836003 0.2681441 0.41487884 0.17506448 0.46428571
0.51551459 0.31960727 -0.05057217 0.60302269]
mean value: 0.3748341704030617
key: train_mcc
value: [0.60767426 0.57504013 0.5924094 0.6304166 0.62959664 0.59166588
0.57083084 0.61348256 0.63047203 0.60873161]
mean value: 0.6050319937989842
key: test_accuracy
value: [0.73170732 0.7804878 0.63414634 0.70731707 0.58536585 0.73170732
0.75609756 0.65853659 0.475 0.8 ]
mean value: 0.6860365853658537
key: train_accuracy
value: [0.80381471 0.78746594 0.79564033 0.8147139 0.8147139 0.79564033
0.78474114 0.80653951 0.81521739 0.80434783]
mean value: 0.8022834972159697
key: test_fscore
value: [0.74418605 0.79069767 0.59459459 0.68421053 0.56410256 0.73170732
0.75 0.69565217 0.43243243 0.78947368]
mean value: 0.6777057013572354
key: train_fscore
value: [0.80327869 0.79032258 0.78991597 0.81005587 0.81621622 0.79108635
0.77620397 0.80222841 0.81420765 0.80540541]
mean value: 0.7998921102609804
key: test_precision
value: [0.69565217 0.73913043 0.64705882 0.72222222 0.61111111 0.75
0.78947368 0.64 0.47058824 0.83333333]
mean value: 0.6898570018396375
key: train_precision
value: [0.80769231 0.78191489 0.8150289 0.83333333 0.80748663 0.80681818
0.80588235 0.81818182 0.81868132 0.80107527]
mean value: 0.8096095007832509
key: test_recall
value: [0.8 0.85 0.55 0.65 0.52380952 0.71428571
0.71428571 0.76190476 0.4 0.75 ]
mean value: 0.6714285714285715
key: train_recall
value: [0.79891304 0.79891304 0.76630435 0.78804348 0.82513661 0.77595628
0.74863388 0.78688525 0.80978261 0.80978261]
mean value: 0.7908351152292706
key: test_roc_auc
value: [0.73333333 0.78214286 0.63214286 0.70595238 0.58690476 0.73214286
0.75714286 0.65595238 0.475 0.8 ]
mean value: 0.6860714285714287
key: train_roc_auc
value: [0.80382811 0.78743466 0.79572048 0.81478677 0.81474222 0.79558684
0.78464303 0.8064861 0.81521739 0.80434783]
mean value: 0.8022793418864338
key: test_jcc
value: [0.59259259 0.65384615 0.42307692 0.52 0.39285714 0.57692308
0.6 0.53333333 0.27586207 0.65217391]
mean value: 0.5220665204638218
key: train_jcc
value: [0.67123288 0.65333333 0.65277778 0.68075117 0.68949772 0.65437788
0.63425926 0.66976744 0.68663594 0.67420814]
mean value: 0.6666841549228235
MCC on Blind test: 0.4
Accuracy on Blind test: 0.74
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01670408 0.01670146 0.01674366 0.01667166 0.01650667 0.01689339
0.01668477 0.01658773 0.01667404 0.01697803]
mean value: 0.01671454906463623
key: score_time
value: [0.01076055 0.01121688 0.01057029 0.0105505 0.01071262 0.01083326
0.01067638 0.01060128 0.01066852 0.01076341]
mean value: 0.010735368728637696
key: test_mcc
value: [0.90238095 0.95238095 0.8047619 0.70714286 0.71121921 0.65871309
0.75714286 0.67700771 0.20100756 0.8510645 ]
mean value: 0.7222821594970632
key: train_mcc
value: [0.73381575 0.75529095 0.76091136 0.77150768 0.77800027 0.77131232
0.75506509 0.77667885 0.80989026 0.75070989]
mean value: 0.7663182423175936
key: test_accuracy
value: [0.95121951 0.97560976 0.90243902 0.85365854 0.85365854 0.82926829
0.87804878 0.82926829 0.6 0.925 ]
mean value: 0.8598170731707317
key: train_accuracy
value: [0.86648501 0.8773842 0.88010899 0.88555858 0.88828338 0.88555858
0.8773842 0.88828338 0.9048913 0.875 ]
mean value: 0.8828937625873712
key: test_fscore
value: [0.95 0.97560976 0.9 0.85 0.85 0.8372093
0.87804878 0.85106383 0.61904762 0.92682927]
mean value: 0.8637808556038483
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
key: train_fscore
value: [0.87002653 0.88 0.88297872 0.88770053 0.89124668 0.88648649
0.8787062 0.88888889 0.90566038 0.87765957]
mean value: 0.8849353994375553
key: test_precision
value: [0.95 0.95238095 0.9 0.85 0.89473684 0.81818182
0.9 0.76923077 0.59090909 0.9047619 ]
mean value: 0.8530201377569798
key: train_precision
value: [0.84974093 0.86387435 0.86458333 0.87368421 0.86597938 0.87700535
0.86702128 0.88172043 0.89839572 0.859375 ]
mean value: 0.8701379979717162
key: test_recall
value: [0.95 1. 0.9 0.85 0.80952381 0.85714286
0.85714286 0.95238095 0.65 0.95 ]
mean value: 0.8776190476190476
key: train_recall
value: [0.89130435 0.89673913 0.90217391 0.90217391 0.91803279 0.89617486
0.89071038 0.89617486 0.91304348 0.89673913]
mean value: 0.9003266809218342
key: test_roc_auc
value: [0.95119048 0.97619048 0.90238095 0.85357143 0.8547619 0.82857143
0.87857143 0.82619048 0.6 0.925 ]
mean value: 0.8596428571428572
key: train_roc_auc
value: [0.8664172 0.87733131 0.88004871 0.88551319 0.88836422 0.88558743
0.87742041 0.88830482 0.9048913 0.875 ]
mean value: 0.882887859349014
key: test_jcc
value: [0.9047619 0.95238095 0.81818182 0.73913043 0.73913043 0.72
0.7826087 0.74074074 0.44827586 0.86363636]
mean value: 0.7708847206988136
key: train_jcc
value: [0.76995305 0.78571429 0.79047619 0.79807692 0.80382775 0.7961165
0.78365385 0.8 0.82758621 0.78199052]
mean value: 0.7937395281338545
MCC on Blind test: 0.62
Accuracy on Blind test: 0.83
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.29302669 1.45275974 1.41800356 1.26620889 1.44646573 1.27663016
1.4107101 1.35374713 1.28098321 1.41262245]
mean value: 1.3611157655715942
key: score_time
value: [0.01493073 0.01485372 0.01487899 0.01501226 0.01526451 0.01565886
0.01561046 0.01576424 0.01891017 0.01555133]
mean value: 0.01564352512359619
key: test_mcc
value: [0.75714286 0.76500781 0.70714286 0.85441771 0.66668392 0.6133669
0.71121921 0.57570364 0.464758 0.8510645 ]
mean value: 0.6966507394206809
key: train_mcc
value: [0.99456506 0.98366595 0.98910074 0.98366595 0.98366547 0.98366547
0.98366547 1. 0.98913043 0.98913043]
mean value: 0.9880254978024888
key: test_accuracy
value: [0.87804878 0.87804878 0.85365854 0.92682927 0.82926829 0.80487805
0.85365854 0.7804878 0.725 0.925 ]
mean value: 0.8454878048780488
key: train_accuracy
value: [0.9972752 0.99182561 0.99455041 0.99182561 0.99182561 0.99182561
0.99182561 1. 0.99456522 0.99456522]
mean value: 0.9940084113256723
key: test_fscore
value: [0.87804878 0.88372093 0.85 0.92307692 0.82051282 0.8
0.85 0.80851064 0.75555556 0.92682927]
mean value: 0.8496254916456217
key: train_fscore
value: [0.99728997 0.99182561 0.99456522 0.99182561 0.99178082 0.99178082
0.99178082 1. 0.99456522 0.99456522]
mean value: 0.9939979316985105
key: test_precision
value: [0.85714286 0.82608696 0.85 0.94736842 0.88888889 0.84210526
0.89473684 0.73076923 0.68 0.9047619 ]
mean value: 0.842186036440041
key: train_precision
value: [0.99459459 0.99453552 0.99456522 0.99453552 0.99450549 0.99450549
0.99450549 1. 0.99456522 0.99456522]
mean value: 0.9950877768536357
key: test_recall
value: [0.9 0.95 0.85 0.9 0.76190476 0.76190476
0.80952381 0.9047619 0.85 0.95 ]
mean value: 0.8638095238095238
key: train_recall
value: [1. 0.98913043 0.99456522 0.98913043 0.98907104 0.98907104
0.98907104 1. 0.99456522 0.99456522]
mean value: 0.9929169636493229
key: test_roc_auc
value: [0.87857143 0.8797619 0.85357143 0.92619048 0.83095238 0.80595238
0.8547619 0.77738095 0.725 0.925 ]
mean value: 0.8457142857142858
key: train_roc_auc
value: [0.99726776 0.99183298 0.99455037 0.99183298 0.99181813 0.99181813
0.99181813 1. 0.99456522 0.99456522]
mean value: 0.9940068899976241
key: test_jcc
value: [0.7826087 0.79166667 0.73913043 0.85714286 0.69565217 0.66666667
0.73913043 0.67857143 0.60714286 0.86363636]
mean value: 0.7421348578957274
key: train_jcc
value: [0.99459459 0.98378378 0.98918919 0.98378378 0.98369565 0.98369565
0.98369565 1. 0.98918919 0.98918919]
mean value: 0.9880816686251469
MCC on Blind test: 0.63
Accuracy on Blind test: 0.84
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03136277 0.02270293 0.02259898 0.02231073 0.02122474 0.02493143
0.02263188 0.02233434 0.02839565 0.02085257]
mean value: 0.023934602737426758
key: score_time
value: [0.01207113 0.00898671 0.00876594 0.00866342 0.00875425 0.01139832
0.0089736 0.00955963 0.00888205 0.00883341]
mean value: 0.009488844871520996
key: test_mcc
value: [1. 0.85441771 0.7098505 0.90238095 0.8047619 0.80817439
0.90649828 0.90649828 0.65743826 0.70352647]
mean value: 0.8253546749346465
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.92682927 0.85365854 0.95121951 0.90243902 0.90243902
0.95121951 0.95121951 0.825 0.85 ]
mean value: 0.9114024390243902
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.92307692 0.84210526 0.95 0.9047619 0.90909091
0.95454545 0.95454545 0.8372093 0.85714286]
mean value: 0.913247806864698
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.94736842 0.88888889 0.95 0.9047619 0.86956522
0.91304348 0.91304348 0.7826087 0.81818182]
mean value: 0.8987461902450461
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.9 0.8 0.95 0.9047619 0.95238095
1. 1. 0.9 0.9 ]
mean value: 0.9307142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.92619048 0.85238095 0.95119048 0.90238095 0.90119048
0.95 0.95 0.825 0.85 ]
mean value: 0.9108333333333333
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.85714286 0.72727273 0.9047619 0.82608696 0.83333333
0.91304348 0.91304348 0.72 0.75 ]
mean value: 0.8444684735554301
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.69
Accuracy on Blind test: 0.87
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.11634302 0.11963344 0.12352252 0.12230539 0.11753988 0.12038803
0.12010169 0.12309289 0.11572337 0.12082148]
mean value: 0.11994717121124268
key: score_time
value: [0.01868987 0.01897264 0.01911902 0.01916981 0.019449 0.01901913
0.01787734 0.01967764 0.01787829 0.01886177]
mean value: 0.018871450424194337
key: test_mcc
value: [0.85441771 0.85441771 0.80817439 0.70714286 0.76500781 0.80817439
0.8047619 0.57570364 0.40824829 0.75858261]
mean value: 0.7344631301111129
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92682927 0.92682927 0.90243902 0.85365854 0.87804878 0.90243902
0.90243902 0.7804878 0.7 0.875 ]
mean value: 0.8648170731707318
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92307692 0.92307692 0.89473684 0.85 0.87179487 0.90909091
0.9047619 0.80851064 0.72727273 0.88372093]
mean value: 0.8696042669709952
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94736842 0.94736842 0.94444444 0.85 0.94444444 0.86956522
0.9047619 0.73076923 0.66666667 0.82608696]
mean value: 0.8631475707104997
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9 0.9 0.85 0.85 0.80952381 0.95238095
0.9047619 0.9047619 0.8 0.95 ]
mean value: 0.8821428571428571
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92619048 0.92619048 0.90119048 0.85357143 0.8797619 0.90119048
0.90238095 0.77738095 0.7 0.875 ]
mean value: 0.8642857142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85714286 0.85714286 0.80952381 0.73913043 0.77272727 0.83333333
0.82608696 0.67857143 0.57142857 0.79166667]
mean value: 0.7736754187841144
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.64
Accuracy on Blind test: 0.84
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01135373 0.0111177 0.01112485 0.01120734 0.01128149 0.0112555
0.01124287 0.01118231 0.01134777 0.01073694]
mean value: 0.011185050010681152
key: score_time
value: [0.0094831 0.00955534 0.0096519 0.0088768 0.00959039 0.00959277
0.00955296 0.00955844 0.00960636 0.00904131]
mean value: 0.009450936317443847
key: test_mcc
value: [0.51320273 0.36666667 0.51190476 0.51966679 0.41487884 0.61152662
0.65952381 0.57570364 0.25031309 0.45056356]
mean value: 0.4873950495640369
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75609756 0.68292683 0.75609756 0.75609756 0.70731707 0.80487805
0.82926829 0.7804878 0.625 0.725 ]
mean value: 0.7423170731707317
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.73684211 0.68292683 0.75 0.72222222 0.72727273 0.81818182
0.82926829 0.80851064 0.61538462 0.71794872]
mean value: 0.7408557966522351
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.66666667 0.75 0.8125 0.69565217 0.7826087
0.85 0.73076923 0.63157895 0.73684211]
mean value: 0.7434395597410471
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.7 0.7 0.75 0.65 0.76190476 0.85714286
0.80952381 0.9047619 0.6 0.7 ]
mean value: 0.7433333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.7547619 0.68333333 0.75595238 0.75357143 0.70595238 0.80357143
0.8297619 0.77738095 0.625 0.725 ]
mean value: 0.7414285714285714
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.58333333 0.51851852 0.6 0.56521739 0.57142857 0.69230769
0.70833333 0.67857143 0.44444444 0.56 ]
mean value: 0.592215471324167
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.41
Accuracy on Blind test: 0.7
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.70994759 1.66301012 1.64955759 1.68670607 1.65594816 1.65043163
1.63158131 1.61386442 1.59506416 1.65376139]
mean value: 1.6509872436523438
key: score_time
value: [0.09827518 0.09878492 0.09821153 0.091434 0.09107804 0.09389925
0.09783268 0.09119153 0.09594035 0.09461141]
mean value: 0.09512588977813721
key: test_mcc
value: [0.90238095 0.90238095 0.90238095 0.85441771 0.80907152 0.80817439
0.95227002 0.7633652 0.55629391 0.75858261]
mean value: 0.8209318202983664
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95121951 0.95121951 0.95121951 0.92682927 0.90243902 0.90243902
0.97560976 0.87804878 0.775 0.875 ]
mean value: 0.9089024390243903
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95 0.95 0.95 0.92307692 0.9 0.90909091
0.97674419 0.88888889 0.79069767 0.88372093]
mean value: 0.9122219511754396
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95 0.95 0.95 0.94736842 0.94736842 0.86956522
0.95454545 0.83333333 0.73913043 0.82608696]
mean value: 0.8967398238679702
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95 0.95 0.95 0.9 0.85714286 0.95238095
1. 0.95238095 0.85 0.95 ]
mean value: 0.9311904761904761
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95119048 0.95119048 0.95119048 0.92619048 0.90357143 0.90119048
0.975 0.87619048 0.775 0.875 ]
mean value: 0.9085714285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9047619 0.9047619 0.9047619 0.85714286 0.81818182 0.83333333
0.95454545 0.8 0.65384615 0.79166667]
mean value: 0.8423001998001998
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.76
Accuracy on Blind test: 0.9
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.9235487 0.93965459 0.92365193 1.00056052 0.96404266 0.9338429
0.99391007 0.94137692 0.92965555 0.90762067]
mean value: 0.945786452293396
key: score_time
value: [0.24140787 0.21806431 0.22907925 0.201998 0.23001051 0.21210074
0.33192515 0.26397443 0.2779243 0.21721077]
mean value: 0.24236953258514404
key: test_mcc
value: [0.90238095 0.95238095 0.90238095 0.85441771 0.76500781 0.80817439
0.95227002 0.7633652 0.45056356 0.75858261]
mean value: 0.8109524139209209
key: train_mcc
value: [0.91859058 0.91859058 0.92927584 0.92392019 0.91860262 0.913028
0.90212679 0.92928213 0.94023128 0.9135293 ]
mean value: 0.9207177296240489
key: test_accuracy
value: [0.95121951 0.97560976 0.95121951 0.92682927 0.87804878 0.90243902
0.97560976 0.87804878 0.725 0.875 ]
mean value: 0.9039024390243903
key: train_accuracy
value: [0.95912807 0.95912807 0.96457766 0.96185286 0.95912807 0.95640327
0.95095368 0.96457766 0.9701087 0.95652174]
mean value: 0.9602379753583699
key: test_fscore
value: [0.95 0.97560976 0.95 0.92307692 0.87179487 0.90909091
0.97674419 0.88888889 0.73170732 0.88372093]
mean value: 0.9060633782301395
key: train_fscore
value: [0.95978552 0.95978552 0.96495957 0.96236559 0.95956873 0.95675676
0.95135135 0.96476965 0.9701897 0.95721925]
mean value: 0.9606751647899552
key: test_precision
value: [0.95 0.95238095 0.95 0.94736842 0.94444444 0.86956522
0.95454545 0.83333333 0.71428571 0.82608696]
mean value: 0.8942010493955574
key: train_precision
value: [0.94708995 0.94708995 0.95721925 0.95212766 0.94680851 0.94652406
0.94117647 0.95698925 0.96756757 0.94210526]
mean value: 0.9504697928526207
key: test_recall
value: [0.95 1. 0.95 0.9 0.80952381 0.95238095
1. 0.95238095 0.75 0.95 ]
mean value: 0.9214285714285714
key: train_recall
value: [0.97282609 0.97282609 0.97282609 0.97282609 0.9726776 0.96721311
0.96174863 0.9726776 0.97282609 0.97282609]
mean value: 0.971127346162984
key: test_roc_auc
value: [0.95119048 0.97619048 0.95119048 0.92619048 0.8797619 0.90119048
0.975 0.87619048 0.725 0.875 ]
mean value: 0.9036904761904762
key: train_roc_auc
value: [0.95909064 0.95909064 0.96455512 0.96182288 0.95916488 0.95643264
0.95098301 0.96459967 0.9701087 0.95652174]
mean value: 0.9602369921596579
key: test_jcc
value: [0.9047619 0.95238095 0.9047619 0.85714286 0.77272727 0.83333333
0.95454545 0.8 0.57692308 0.79166667]
mean value: 0.8348243423243423
key: train_jcc
value: [0.92268041 0.92268041 0.93229167 0.92746114 0.92227979 0.91709845
0.90721649 0.93193717 0.94210526 0.91794872]
mean value: 0.9243699518374119
MCC on Blind test: 0.78
Accuracy on Blind test: 0.91
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02452135 0.01085424 0.01013875 0.01106048 0.01107907 0.01072407
0.01107764 0.01099753 0.01002288 0.01183772]
mean value: 0.01223137378692627
key: score_time
value: [0.01152945 0.00886941 0.00989628 0.00964642 0.00978494 0.00939059
0.00967145 0.00969195 0.00964117 0.01014018]
mean value: 0.009826183319091797
key: test_mcc
value: [0.7633652 0.75714286 0.7633652 0.56086079 0.49692935 0.6133669
0.71121921 0.56086079 0. 0.7 ]
mean value: 0.5927110280741841
key: train_mcc
value: [0.68990534 0.61522034 0.71127744 0.70268762 0.67492212 0.70148323
0.69678894 0.69625716 0.70409854 0.69104454]
mean value: 0.6883685277058793
key: test_accuracy
value: [0.87804878 0.87804878 0.87804878 0.7804878 0.73170732 0.80487805
0.85365854 0.7804878 0.5 0.85 ]
mean value: 0.7935365853658537
key: train_accuracy
value: [0.84468665 0.80653951 0.85558583 0.85013624 0.83651226 0.85013624
0.84741144 0.84741144 0.85054348 0.8451087 ]
mean value: 0.8434071792441654
key: test_fscore
value: [0.86486486 0.87804878 0.86486486 0.76923077 0.68571429 0.8
0.85 0.79069767 0.47368421 0.85 ]
mean value: 0.782710545010751
key: train_fscore
value: [0.84210526 0.79886686 0.85479452 0.84419263 0.82954545 0.84507042
0.84090909 0.84180791 0.84330484 0.84122563]
mean value: 0.8381822621430892
key: test_precision
value: [0.94117647 0.85714286 0.94117647 0.78947368 0.85714286 0.84210526
0.89473684 0.77272727 0.5 0.85 ]
mean value: 0.8245681717663141
key: train_precision
value: [0.85875706 0.83431953 0.86187845 0.8816568 0.86390533 0.87209302
0.87573964 0.87134503 0.88622754 0.86285714]
mean value: 0.8668779557223617
key: test_recall
value: [0.8 0.9 0.8 0.75 0.57142857 0.76190476
0.80952381 0.80952381 0.45 0.85 ]
mean value: 0.7502380952380953
key: train_recall
value: [0.82608696 0.76630435 0.84782609 0.80978261 0.79781421 0.81967213
0.80874317 0.81420765 0.80434783 0.82065217]
mean value: 0.8115437158469946
key: test_roc_auc
value: [0.87619048 0.87857143 0.87619048 0.7797619 0.73571429 0.80595238
0.8547619 0.7797619 0.5 0.85 ]
mean value: 0.7936904761904762
key: train_roc_auc
value: [0.84473747 0.80664944 0.85560703 0.8502465 0.8364071 0.85005346
0.84730637 0.84732122 0.85054348 0.8451087 ]
mean value: 0.8433980755523878
key: test_jcc
value: [0.76190476 0.7826087 0.76190476 0.625 0.52173913 0.66666667
0.73913043 0.65384615 0.31034483 0.73913043]
mean value: 0.6562275867560725
key: train_jcc
value: [0.72727273 0.66509434 0.74641148 0.73039216 0.70873786 0.73170732
0.7254902 0.72682927 0.72906404 0.72596154]
mean value: 0.7216960930404063
MCC on Blind test: 0.54
Accuracy on Blind test: 0.8
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.07918167 0.07073474 0.07140946 0.07169986 0.06777453 0.06753397
0.06917024 0.07282233 0.23122311 0.0630393 ]
mean value: 0.08645892143249512
key: score_time
value: [0.01080298 0.01173496 0.01048946 0.0110333 0.01068783 0.01041865
0.01061344 0.01113892 0.01103044 0.01089954]
mean value: 0.010884952545166016
key: test_mcc
value: [1. 0.95238095 0.8047619 0.90238095 0.8547619 0.90649828
0.95227002 0.86240942 0.75093926 0.8510645 ]
mean value: 0.8837467182165777
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.97560976 0.90243902 0.95121951 0.92682927 0.95121951
0.97560976 0.92682927 0.875 0.925 ]
mean value: 0.9409756097560975
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.97560976 0.9 0.95 0.92682927 0.95454545
0.97674419 0.93333333 0.87804878 0.92682927]
mean value: 0.9421940047096031
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.95238095 0.9 0.95 0.95 0.91304348
0.95454545 0.875 0.85714286 0.9047619 ]
mean value: 0.9256874647092038
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.9 0.95 0.9047619 1. 1.
1. 0.9 0.95 ]
mean value: 0.9604761904761905
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.97619048 0.90238095 0.95119048 0.92738095 0.95
0.975 0.925 0.875 0.925 ]
mean value: 0.9407142857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.95238095 0.81818182 0.9047619 0.86363636 0.91304348
0.95454545 0.875 0.7826087 0.86363636]
mean value: 0.8927795031055901
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.78
Accuracy on Blind test: 0.91
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04169512 0.08199573 0.03944111 0.04006433 0.07459378 0.03755236
0.03834867 0.08876681 0.0664711 0.06832337]
mean value: 0.05772523880004883
key: score_time
value: [0.02241969 0.01253891 0.01246405 0.01248622 0.01242995 0.01240349
0.01240277 0.02495623 0.02107215 0.02245378]
mean value: 0.016562724113464357
key: test_mcc
value: [0.7565654 0.76500781 0.61969655 0.65871309 0.56190476 0.65871309
0.8047619 0.60952381 0.15171652 0.75858261]
mean value: 0.6345185543993015
key: train_mcc
value: [0.86920884 0.88556918 0.86942317 0.88016394 0.89657222 0.86966229
0.85830957 0.88567102 0.90217391 0.86961659]
mean value: 0.8786370741686671
key: test_accuracy
value: [0.87804878 0.87804878 0.80487805 0.82926829 0.7804878 0.82926829
0.90243902 0.80487805 0.575 0.875 ]
mean value: 0.8157317073170731
key: train_accuracy
value: [0.9346049 0.94277929 0.9346049 0.9400545 0.94822888 0.9346049
0.92915531 0.94277929 0.95108696 0.93478261]
mean value: 0.9392681554318209
key: test_fscore
value: [0.87179487 0.88372093 0.77777778 0.82051282 0.7804878 0.8372093
0.9047619 0.80952381 0.60465116 0.88372093]
mean value: 0.8174161314830629
key: train_fscore
value: [0.93478261 0.94308943 0.93406593 0.93989071 0.9476584 0.93333333
0.92896175 0.94214876 0.95108696 0.93442623]
mean value: 0.9389444114569994
key: test_precision
value: [0.89473684 0.82608696 0.875 0.84210526 0.8 0.81818182
0.9047619 0.80952381 0.56521739 0.82608696]
mean value: 0.8161700942078517
key: train_precision
value: [0.93478261 0.94054054 0.94444444 0.94505495 0.95555556 0.94915254
0.92896175 0.95 0.95108696 0.93956044]
mean value: 0.9439139781380078
key: test_recall
value: [0.85 0.95 0.7 0.8 0.76190476 0.85714286
0.9047619 0.80952381 0.65 0.95 ]
mean value: 0.8233333333333334
key: train_recall
value: [0.93478261 0.94565217 0.92391304 0.93478261 0.93989071 0.91803279
0.92896175 0.93442623 0.95108696 0.92934783]
mean value: 0.9340876692801141
key: test_roc_auc
value: [0.87738095 0.8797619 0.80238095 0.82857143 0.78095238 0.82857143
0.90238095 0.8047619 0.575 0.875 ]
mean value: 0.8154761904761905
key: train_roc_auc
value: [0.93460442 0.94277144 0.93463412 0.9400689 0.94820622 0.93455987
0.92915479 0.94275659 0.95108696 0.93478261]
mean value: 0.9392625920646235
key: test_jcc
value: [0.77272727 0.79166667 0.63636364 0.69565217 0.64 0.72
0.82608696 0.68 0.43333333 0.79166667]
mean value: 0.6987496706192359
key: train_jcc
value: [0.87755102 0.89230769 0.87628866 0.88659794 0.90052356 0.875
0.86734694 0.890625 0.90673575 0.87692308]
mean value: 0.8849899637857348
MCC on Blind test: 0.55
Accuracy on Blind test: 0.8
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01761723 0.01124287 0.01097202 0.01078892 0.01084352 0.01088452
0.00967693 0.00969195 0.00969839 0.00965333]
mean value: 0.01110696792602539
key: score_time
value: [0.01886177 0.00984693 0.00961399 0.00942779 0.00946093 0.0087142
0.00870919 0.00880623 0.00862551 0.00872087]
mean value: 0.010078740119934083
key: test_mcc
value: [0.7197263 0.63994524 0.75714286 0.65871309 0.51551459 0.66668392
0.66668392 0.46428571 0. 0.80403025]
mean value: 0.5892725898158476
key: train_mcc
value: [0.63006799 0.5755263 0.65839315 0.69024299 0.61906921 0.61938983
0.63030694 0.62998014 0.67983997 0.6148664 ]
mean value: 0.6347682927775092
key: test_accuracy
value: [0.85365854 0.80487805 0.87804878 0.82926829 0.75609756 0.82926829
0.82926829 0.73170732 0.5 0.9 ]
mean value: 0.791219512195122
key: train_accuracy
value: [0.8147139 0.78746594 0.82833787 0.84468665 0.80926431 0.80926431
0.8147139 0.8147139 0.83967391 0.80706522]
mean value: 0.8169899893377562
key: test_fscore
value: [0.83333333 0.82608696 0.87804878 0.82051282 0.75 0.82051282
0.82051282 0.73170732 0.47368421 0.9047619 ]
mean value: 0.7859160964242731
key: train_fscore
value: [0.81111111 0.78333333 0.82253521 0.84122563 0.80446927 0.80337079
0.80898876 0.81005587 0.8365651 0.80222841]
mean value: 0.8123883481888775
key: test_precision
value: [0.9375 0.73076923 0.85714286 0.84210526 0.78947368 0.88888889
0.88888889 0.75 0.5 0.86363636]
mean value: 0.8048405176694651
key: train_precision
value: [0.82954545 0.80113636 0.85380117 0.86285714 0.82285714 0.8265896
0.83236994 0.82857143 0.85310734 0.82285714]
mean value: 0.8333692727120341
key: test_recall
value: [0.75 0.95 0.9 0.8 0.71428571 0.76190476
0.76190476 0.71428571 0.45 0.95 ]
mean value: 0.7752380952380953
key: train_recall
value: [0.79347826 0.76630435 0.79347826 0.82065217 0.78688525 0.78142077
0.78688525 0.79234973 0.82065217 0.7826087 ]
mean value: 0.7924714896650036
key: test_roc_auc
value: [0.85119048 0.80833333 0.87857143 0.82857143 0.75714286 0.83095238
0.83095238 0.73214286 0.5 0.9 ]
mean value: 0.7917857142857143
key: train_roc_auc
value: [0.81477192 0.78752376 0.82843312 0.84475232 0.80920349 0.80918864
0.81463828 0.81465312 0.83967391 0.80706522]
mean value: 0.8169903777619387
key: test_jcc
value: [0.71428571 0.7037037 0.7826087 0.69565217 0.6 0.69565217
0.69565217 0.57692308 0.31034483 0.82608696]
mean value: 0.6600909496411745
key: train_jcc
value: [0.68224299 0.64383562 0.69856459 0.72596154 0.6728972 0.6713615
0.67924528 0.68075117 0.71904762 0.66976744]
mean value: 0.6843674955100508
MCC on Blind test: 0.55
Accuracy on Blind test: 0.8
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01511908 0.01789021 0.02156711 0.01820374 0.01853132 0.01843572
0.01728106 0.01646686 0.01864266 0.01955032]
mean value: 0.01816880702972412
key: score_time
value: [0.00898719 0.01122427 0.01118827 0.01151872 0.01170158 0.01176333
0.01167345 0.01163459 0.01162744 0.01157379]
mean value: 0.011289262771606445
key: test_mcc
value: [0.698212 0.7197263 0.46494781 0.46494781 0.72229808 0.75714286
0.66496381 0.61152662 0.40824829 0.8510645 ]
mean value: 0.6363078090294618
key: train_mcc
value: [0.65746895 0.78653727 0.5557906 0.53460389 0.83671955 0.84197553
0.6508894 0.84760268 0.89151503 0.74762767]
mean value: 0.7350730570093046
key: test_accuracy
value: [0.82926829 0.85365854 0.68292683 0.68292683 0.85365854 0.87804878
0.80487805 0.80487805 0.7 0.925 ]
mean value: 0.8015243902439024
key: train_accuracy
value: [0.80926431 0.88828338 0.73841962 0.72479564 0.91825613 0.92098093
0.80381471 0.92370572 0.94565217 0.86141304]
mean value: 0.8534585653358607
key: test_fscore
value: [0.78787879 0.83333333 0.51851852 0.51851852 0.84210526 0.87804878
0.76470588 0.81818182 0.72727273 0.92307692]
mean value: 0.7611640552779267
key: train_fscore
value: [0.77124183 0.87905605 0.64963504 0.62453532 0.91891892 0.92098093
0.76 0.92265193 0.94623656 0.8411215 ]
mean value: 0.8234378063262462
key: test_precision
value: [1. 0.9375 1. 1. 0.94117647 0.9
1. 0.7826087 0.66666667 0.94736842]
mean value: 0.9175320253959708
key: train_precision
value: [0.96721311 0.96129032 0.98888889 0.98823529 0.90909091 0.91847826
0.97435897 0.93296089 0.93617021 0.98540146]
mean value: 0.9562088331135449
key: test_recall
value: [0.65 0.75 0.35 0.35 0.76190476 0.85714286
0.61904762 0.85714286 0.8 0.9 ]
mean value: 0.6895238095238095
key: train_recall
value: [0.64130435 0.80978261 0.48369565 0.45652174 0.92896175 0.92349727
0.62295082 0.91256831 0.95652174 0.73369565]
mean value: 0.7469499881206938
key: test_roc_auc
value: [0.825 0.85119048 0.675 0.675 0.85595238 0.87857143
0.80952381 0.80357143 0.7 0.925 ]
mean value: 0.7998809523809524
key: train_roc_auc
value: [0.80972321 0.88849786 0.73911559 0.72552863 0.91828522 0.92098776
0.80332324 0.92367546 0.94565217 0.86141304]
mean value: 0.853620218579235
key: test_jcc
value: [0.65 0.71428571 0.35 0.35 0.72727273 0.7826087
0.61904762 0.69230769 0.57142857 0.85714286]
mean value: 0.6314093877137356
key: train_jcc
value: [0.62765957 0.78421053 0.48108108 0.45405405 0.85 0.85353535
0.61290323 0.85641026 0.89795918 0.72580645]
mean value: 0.7143619706957444
MCC on Blind test: 0.63
Accuracy on Blind test: 0.83
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01723146 0.03948784 0.01812649 0.03602719 0.01928163 0.01920485
0.0195148 0.01716614 0.01644444 0.01767397]
mean value: 0.022015881538391114
key: score_time
value: [0.01169276 0.01263618 0.0126977 0.01592565 0.01178527 0.01175618
0.01171517 0.01169419 0.01178122 0.01174879]
mean value: 0.012343311309814453
key: test_mcc
value: [0.85441771 0.65915306 0.58066054 0.80907152 0.80907152 0.75714286
0.90649828 0.59335232 0.22941573 0.53881591]
mean value: 0.6737599446817913
key: train_mcc
value: [0.83208515 0.66194954 0.64802546 0.84785015 0.86084709 0.88095976
0.79006361 0.84482196 0.33933982 0.48038446]
mean value: 0.7186327015117803
key: test_accuracy
value: [0.92682927 0.80487805 0.7804878 0.90243902 0.90243902 0.87804878
0.95121951 0.7804878 0.55 0.725 ]
mean value: 0.8201829268292683
key: train_accuracy
value: [0.91553134 0.80653951 0.80381471 0.92370572 0.92915531 0.9400545
0.89100817 0.92098093 0.60326087 0.6875 ]
mean value: 0.8421551060300912
key: test_fscore
value: [0.92307692 0.75 0.8 0.9047619 0.9 0.87804878
0.95454545 0.81632653 0.18181818 0.62068966]
mean value: 0.7729267430474928
key: train_fscore
value: [0.91364903 0.76254181 0.83333333 0.92513369 0.93157895 0.94117647
0.89795918 0.92388451 0.34234234 0.54545455]
mean value: 0.8017053858125319
key: test_precision
value: [0.94736842 1. 0.72 0.86363636 0.94736842 0.9
0.91304348 0.71428571 1. 1. ]
mean value: 0.900570239828821
key: train_precision
value: [0.93714286 0.99130435 0.72580645 0.91052632 0.89847716 0.92146597
0.84210526 0.88888889 1. 1. ]
mean value: 0.9115717250364899
key: test_recall
value: [0.9 0.6 0.9 0.95 0.85714286 0.85714286
1. 0.95238095 0.1 0.45 ]
mean value: 0.7566666666666666
key: train_recall
value: [0.89130435 0.61956522 0.97826087 0.94021739 0.96721311 0.96174863
0.96174863 0.96174863 0.20652174 0.375 ]
mean value: 0.7863328581610833
key: test_roc_auc
value: [0.92619048 0.8 0.78333333 0.90357143 0.90357143 0.87857143
0.95 0.77619048 0.55 0.725 ]
mean value: 0.8196428571428571
key: train_roc_auc
value: [0.91559753 0.80705037 0.80333809 0.92366061 0.92925873 0.94011345
0.8912004 0.92109171 0.60326087 0.6875 ]
mean value: 0.8422071751009741
key: test_jcc
value: [0.85714286 0.6 0.66666667 0.82608696 0.81818182 0.7826087
0.91304348 0.68965517 0.1 0.45 ]
mean value: 0.6703385644839918
key: train_jcc
value: [0.84102564 0.61621622 0.71428571 0.86069652 0.87192118 0.88888889
0.81481481 0.85853659 0.20652174 0.375 ]
mean value: 0.7047907299406508
MCC on Blind test: 0.62
Accuracy on Blind test: 0.83
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.18305397 0.1674881 0.16102242 0.16084981 0.15811038 0.15780878
0.15861988 0.16229534 0.16273403 0.16005588]
mean value: 0.16320385932922363
key: score_time
value: [0.01662803 0.01653552 0.01613188 0.01542115 0.01611686 0.01628065
0.01615357 0.01622725 0.01625323 0.01510215]
mean value: 0.016085028648376465
key: test_mcc
value: [1. 1. 0.7565654 0.80817439 0.80907152 0.85441771
1. 0.7633652 0.65081403 0.85972695]
mean value: 0.8502135194128032
key: train_mcc
value: [0.99456506 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994565056421516
key: test_accuracy
value: [1. 1. 0.87804878 0.90243902 0.90243902 0.92682927
1. 0.87804878 0.825 0.925 ]
mean value: 0.9237804878048781
key: train_accuracy
value: [0.9972752 1. 1. 1. 1. 1. 1.
1. 1. 1. ]
mean value: 0.9997275204359672
key: test_fscore
value: [1. 1. 0.87179487 0.89473684 0.9 0.93023256
1. 0.88888889 0.82926829 0.93023256]
mean value: 0.924515401175102
key: train_fscore
value: [0.99728997 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9997289972899729
key: test_precision
value: [1. 1. 0.89473684 0.94444444 0.94736842 0.90909091
1. 0.83333333 0.80952381 0.86956522]
mean value: 0.9208062976941696
key: train_precision
value: [0.99459459 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994594594594595
key: test_recall
value: [1. 1. 0.85 0.85 0.85714286 0.95238095
1. 0.95238095 0.85 1. ]
mean value: 0.9311904761904761
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 1. 0.87738095 0.90119048 0.90357143 0.92619048
1. 0.87619048 0.825 0.925 ]
mean value: 0.9234523809523809
key: train_roc_auc
value: [0.99726776 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9997267759562841
key: test_jcc
value: [1. 1. 0.77272727 0.80952381 0.81818182 0.86956522
1. 0.8 0.70833333 0.86956522]
mean value: 0.8647896668548842
key: train_jcc
value: [0.99459459 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9994594594594595
MCC on Blind test: 0.76
Accuracy on Blind test: 0.9
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06312132 0.05718136 0.0733006 0.07361269 0.07186341 0.06613135
0.07574439 0.06043625 0.05493975 0.07683802]
mean value: 0.06731691360473632
key: score_time
value: [0.0210464 0.02701759 0.02388501 0.03913212 0.02821994 0.0386188
0.03353763 0.02436566 0.03699136 0.03753781]
mean value: 0.031035232543945312
key: test_mcc
value: [1. 0.95238095 0.7098505 0.85441771 0.8547619 0.90649828
0.95227002 0.86240942 0.65081403 0.80403025]
mean value: 0.854743305832982
key: train_mcc
value: [0.99456522 0.97825894 0.97825894 0.98910074 0.97820147 0.98378331
0.99456506 0.98910074 0.98918887 0.99457991]
mean value: 0.9869603176262142
key: test_accuracy
value: [1. 0.97560976 0.85365854 0.92682927 0.92682927 0.95121951
0.97560976 0.92682927 0.825 0.9 ]
mean value: 0.9261585365853658
key: train_accuracy
value: [0.9972752 0.98910082 0.98910082 0.99455041 0.98910082 0.99182561
0.9972752 0.99455041 0.99456522 0.99728261]
mean value: 0.993462711764009
key: test_fscore
value: [1. 0.97560976 0.84210526 0.92307692 0.92682927 0.95454545
0.97674419 0.93333333 0.82926829 0.9047619 ]
mean value: 0.9266274381995193
key: train_fscore
value: [0.9972752 0.98918919 0.98918919 0.99456522 0.98907104 0.99186992
0.99726027 0.99453552 0.99453552 0.9972752 ]
mean value: 0.9934766273663551
key: test_precision
value: [1. 0.95238095 0.88888889 0.94736842 0.95 0.91304348
0.95454545 0.875 0.80952381 0.86363636]
mean value: 0.915438736828897
key: train_precision
value: [1. 0.98387097 0.98387097 0.99456522 0.98907104 0.98387097
1. 0.99453552 1. 1. ]
mean value: 0.992978467799416
key: test_recall
value: [1. 1. 0.8 0.9 0.9047619 1. 1.
1. 0.85 0.95 ]
mean value: 0.9404761904761905
key: train_recall
value: [0.99456522 0.99456522 0.99456522 0.99456522 0.98907104 1.
0.99453552 0.99453552 0.98913043 0.99456522]
mean value: 0.9940098598241862
key: test_roc_auc
value: [1. 0.97619048 0.85238095 0.92619048 0.92738095 0.95
0.975 0.925 0.825 0.9 ]
mean value: 0.9257142857142857
key: train_roc_auc
value: [0.99728261 0.98908589 0.98908589 0.99455037 0.98910074 0.99184783
0.99726776 0.99455037 0.99456522 0.99728261]
mean value: 0.9934619268234735
key: test_jcc
value: [1. 0.95238095 0.72727273 0.85714286 0.86363636 0.91304348
0.95454545 0.875 0.70833333 0.82608696]
mean value: 0.8677442123094297
key: train_jcc
value: [0.99456522 0.97860963 0.97860963 0.98918919 0.97837838 0.98387097
0.99453552 0.98913043 0.98913043 0.99456522]
mean value: 0.987058461011991
MCC on Blind test: 0.75
Accuracy on Blind test: 0.89
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.09550643 0.09747839 0.07713652 0.11133432 0.1167531 0.05979848
0.10996175 0.11596441 0.05582666 0.06558204]
mean value: 0.09053421020507812
key: score_time
value: [0.02205396 0.0137794 0.02239418 0.01748085 0.01414633 0.01440907
0.02215457 0.02253461 0.01375127 0.02219558]
mean value: 0.018489980697631837
key: test_mcc
value: [0.75714286 0.71121921 0.56190476 0.41428571 0.12142857 0.41766229
0.51551459 0.47439956 0. 0.65081403]
mean value: 0.46243715854258727
key: train_mcc
value: [1. 0.99456506 0.99456506 0.99456506 1. 0.99456522
0.99456522 1. 0.99457991 0.99457991]
mean value: 0.9961985415816125
key: test_accuracy
value: [0.87804878 0.85365854 0.7804878 0.70731707 0.56097561 0.70731707
0.75609756 0.73170732 0.5 0.825 ]
mean value: 0.7300609756097561
key: train_accuracy
value: [1. 0.9972752 0.9972752 0.9972752 1. 0.9972752
0.9972752 1. 0.99728261 0.99728261]
mean value: 0.9980941239189669
key: test_fscore
value: [0.87804878 0.85714286 0.7804878 0.7 0.57142857 0.7
0.75 0.76595745 0.47368421 0.82926829]
mean value: 0.7306017963955036
key: train_fscore
value: [1. 0.99728997 0.99728997 0.99728997 1. 0.9972752
0.9972752 1. 0.99728997 0.99728997]
mean value: 0.9981000273217991
key: test_precision
value: [0.85714286 0.81818182 0.76190476 0.7 0.57142857 0.73684211
0.78947368 0.69230769 0.5 0.80952381]
mean value: 0.7236805299963195
key: train_precision
value: [1. 0.99459459 0.99459459 0.99459459 1. 0.99456522
0.99456522 1. 0.99459459 0.99459459]
mean value: 0.9962103407755581
key: test_recall
value: [0.9 0.9 0.8 0.7 0.57142857 0.66666667
0.71428571 0.85714286 0.45 0.85 ]
mean value: 0.7409523809523809
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.87857143 0.8547619 0.78095238 0.70714286 0.56071429 0.70833333
0.75714286 0.72857143 0.5 0.825 ]
mean value: 0.7301190476190477
key: train_roc_auc
value: [1. 0.99726776 0.99726776 0.99726776 1. 0.99728261
0.99728261 1. 0.99728261 0.99728261]
mean value: 0.9980933713471133
key: test_jcc
value: [0.7826087 0.75 0.64 0.53846154 0.4 0.53846154
0.6 0.62068966 0.31034483 0.70833333]
mean value: 0.5888899588667205
key: train_jcc
value: [1. 0.99459459 0.99459459 0.99459459 1. 0.99456522
0.99456522 1. 0.99459459 0.99459459]
mean value: 0.9962103407755581
MCC on Blind test: 0.5
Accuracy on Blind test: 0.78
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.61533713 0.6027925 0.6025629 0.59919167 0.60118747 0.60153985
0.60033226 0.61004019 0.60367298 0.60337877]
mean value: 0.6040035724639893
key: score_time
value: [0.00962615 0.01016498 0.00934196 0.00935721 0.0094378 0.00935674
0.00931764 0.00950098 0.00953007 0.00993609]
mean value: 0.009556961059570313
key: test_mcc
value: [1. 0.95238095 0.7098505 0.90238095 0.8547619 0.86240942
0.90649828 0.86240942 0.70352647 0.75858261]
mean value: 0.8512800501192865
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.97560976 0.85365854 0.95121951 0.92682927 0.92682927
0.95121951 0.92682927 0.85 0.875 ]
mean value: 0.9237195121951219
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.97560976 0.84210526 0.95 0.92682927 0.93333333
0.95454545 0.93333333 0.85714286 0.88372093]
mean value: 0.9256620196135675
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.95238095 0.88888889 0.95 0.95 0.875
0.91304348 0.875 0.81818182 0.82608696]
mean value: 0.9048582094234268
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.8 0.95 0.9047619 1. 1.
1. 0.9 0.95 ]
mean value: 0.9504761904761905
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.97619048 0.85238095 0.95119048 0.92738095 0.925
0.95 0.925 0.85 0.875 ]
mean value: 0.9232142857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.95238095 0.72727273 0.9047619 0.86363636 0.875
0.91304348 0.875 0.75 0.79166667]
mean value: 0.8652762092979485
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.73
Accuracy on Blind test: 0.89
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.02614141 0.02861094 0.02827191 0.02825522 0.02861452 0.0286746
0.02852798 0.02840257 0.04027748 0.03529644]
mean value: 0.03010730743408203
key: score_time
value: [0.01253939 0.01268291 0.01283693 0.01493955 0.01348925 0.01353598
0.01356745 0.01352119 0.0201292 0.01334429]
mean value: 0.014058613777160644
key: test_mcc
value: [0.30603535 0.41428571 0.36718832 0.46300848 0.4373371 0.56836003
0.46300848 0.56190476 0.30151134 0.40201513]
mean value: 0.42846547057135465
key: train_mcc
value: [0.85739567 0.94039882 0.78808937 0.98378331 0.94600476 0.92132206
0.93083981 0.94639726 0.94615534 0.96208441]
mean value: 0.922247080457379
key: test_accuracy
value: [0.63414634 0.70731707 0.68292683 0.73170732 0.70731707 0.7804878
0.73170732 0.7804878 0.65 0.7 ]
mean value: 0.7106097560975609
key: train_accuracy
value: [0.92370572 0.97002725 0.88555858 0.99182561 0.97275204 0.95912807
0.96457766 0.97275204 0.97282609 0.98097826]
mean value: 0.9594131323302926
key: test_fscore
value: [0.69387755 0.7 0.64864865 0.71794872 0.66666667 0.76923077
0.74418605 0.7804878 0.66666667 0.68421053]
mean value: 0.7071923397887343
key: train_fscore
value: [0.92929293 0.97050938 0.89655172 0.99178082 0.97222222 0.95726496
0.96551724 0.97206704 0.97237569 0.98082192]
mean value: 0.9608403927115273
key: test_precision
value: [0.5862069 0.7 0.70588235 0.73684211 0.8 0.83333333
0.72727273 0.8 0.63636364 0.72222222]
mean value: 0.7248123273947977
key: train_precision
value: [0.86792453 0.95767196 0.81981982 1. 0.98870056 1.
0.93814433 0.99428571 0.98876404 0.98895028]
mean value: 0.9544261236134951
key: test_recall
value: [0.85 0.7 0.6 0.7 0.57142857 0.71428571
0.76190476 0.76190476 0.7 0.65 ]
mean value: 0.7009523809523809
key: train_recall
value: [1. 0.98369565 0.98913043 0.98369565 0.95628415 0.91803279
0.99453552 0.95081967 0.95652174 0.97282609]
mean value: 0.9705541696364932
key: test_roc_auc
value: [0.63928571 0.70714286 0.68095238 0.73095238 0.71071429 0.78214286
0.73095238 0.78095238 0.65 0.7 ]
mean value: 0.7113095238095238
key: train_roc_auc
value: [0.92349727 0.9699899 0.8852756 0.99184783 0.97270729 0.95901639
0.96465906 0.97269244 0.97282609 0.98097826]
mean value: 0.9593490140175813
key: test_jcc
value: [0.53125 0.53846154 0.48 0.56 0.5 0.625
0.59259259 0.64 0.5 0.52 ]
mean value: 0.5487304131054132
key: train_jcc
value: [0.86792453 0.94270833 0.8125 0.98369565 0.94594595 0.91803279
0.93333333 0.94565217 0.94623656 0.96236559]
mean value: 0.9258394904424336
MCC on Blind test: 0.3
Accuracy on Blind test: 0.62
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02562833 0.05641294 0.03336596 0.03528643 0.03723907 0.03774786
0.03155208 0.03795075 0.03786731 0.03780031]
mean value: 0.037085103988647464
key: score_time
value: [0.01957393 0.03293657 0.0218451 0.01995754 0.01814795 0.01629925
0.02359295 0.02411532 0.02410388 0.02387118]
mean value: 0.02244436740875244
key: test_mcc
value: [0.85441771 0.80907152 0.7098505 0.70714286 0.76500781 0.70714286
0.8047619 0.7633652 0.35400522 0.90453403]
mean value: 0.7379299602053512
key: train_mcc
value: [0.83670017 0.84197084 0.82561178 0.83107125 0.84762076 0.83656559
0.83118002 0.85318761 0.88588265 0.82095534]
mean value: 0.8410746004220283
key: test_accuracy
value: [0.92682927 0.90243902 0.85365854 0.85365854 0.87804878 0.85365854
0.90243902 0.87804878 0.675 0.95 ]
mean value: 0.8673780487804879
key: train_accuracy
value: [0.91825613 0.92098093 0.91280654 0.91553134 0.92370572 0.91825613
0.91553134 0.92643052 0.94293478 0.91032609]
mean value: 0.9204759507167397
key: test_fscore
value: [0.92307692 0.9047619 0.84210526 0.85 0.87179487 0.85714286
0.9047619 0.88888889 0.69767442 0.95238095]
mean value: 0.8692587984570849
key: train_fscore
value: [0.91935484 0.92140921 0.91304348 0.91598916 0.92432432 0.91847826
0.91598916 0.92722372 0.94277929 0.91152815]
mean value: 0.9210119597403508
key: test_precision
value: [0.94736842 0.86363636 0.88888889 0.85 0.94444444 0.85714286
0.9047619 0.83333333 0.65217391 0.90909091]
mean value: 0.8650841035394811
key: train_precision
value: [0.90957447 0.91891892 0.91304348 0.91351351 0.9144385 0.91351351
0.90860215 0.91489362 0.94535519 0.8994709 ]
mean value: 0.915132425325236
key: test_recall
value: [0.9 0.95 0.8 0.85 0.80952381 0.85714286
0.9047619 0.95238095 0.75 1. ]
mean value: 0.8773809523809524
key: train_recall
value: [0.92934783 0.92391304 0.91304348 0.91847826 0.93442623 0.92349727
0.92349727 0.93989071 0.94021739 0.92391304]
mean value: 0.9270224518888097
key: test_roc_auc
value: [0.92619048 0.90357143 0.85238095 0.85357143 0.8797619 0.85357143
0.90238095 0.87619048 0.675 0.95 ]
mean value: 0.8672619047619048
key: train_roc_auc
value: [0.91822583 0.92097292 0.91280589 0.91552328 0.92373485 0.91827037
0.91555298 0.92646709 0.94293478 0.91032609]
mean value: 0.920481408885721
key: test_jcc
value: [0.85714286 0.82608696 0.72727273 0.73913043 0.77272727 0.75
0.82608696 0.8 0.53571429 0.90909091]
mean value: 0.7743252399774139
key: train_jcc
value: [0.85074627 0.85427136 0.84 0.845 0.85929648 0.84924623
0.845 0.86432161 0.89175258 0.83743842]
mean value: 0.8537072948013584
MCC on Blind test: 0.63
Accuracy on Blind test: 0.83
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.2605021 0.26175451 0.26747704 0.42438817 0.33081579 0.27082133
0.26918602 0.26025081 0.24329901 0.27351165]
mean value: 0.2862006425857544
key: score_time
value: [0.02257133 0.02124763 0.0225091 0.03519368 0.02244186 0.02156854
0.01980209 0.02273059 0.01768684 0.02059722]
mean value: 0.0226348876953125
key: test_mcc
value: [0.85441771 0.80907152 0.7098505 0.70714286 0.76500781 0.65871309
0.75714286 0.7633652 0.25286087 0.90453403]
mean value: 0.7182106443016628
key: train_mcc
value: [0.83670017 0.84197084 0.78774111 0.83107125 0.84762076 0.76096804
0.76064422 0.85318761 0.80477581 0.82095534]
mean value: 0.8145635147399292
key: test_accuracy
value: [0.92682927 0.90243902 0.85365854 0.85365854 0.87804878 0.82926829
0.87804878 0.87804878 0.625 0.95 ]
mean value: 0.8575
key: train_accuracy
value: [0.91825613 0.92098093 0.89373297 0.91553134 0.92370572 0.88010899
0.88010899 0.92643052 0.90217391 0.91032609]
mean value: 0.9071355585831062
key: test_fscore
value: [0.92307692 0.9047619 0.84210526 0.85 0.87179487 0.8372093
0.87804878 0.88888889 0.65116279 0.95238095]
mean value: 0.8599429677572497
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:176: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:179: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.91935484 0.92140921 0.89544236 0.91598916 0.92432432 0.88235294
0.88172043 0.92722372 0.90374332 0.91152815]
mean value: 0.9083088452869689
key: test_precision
value: [0.94736842 0.86363636 0.88888889 0.85 0.94444444 0.81818182
0.9 0.83333333 0.60869565 0.90909091]
mean value: 0.8563639830802302
key: train_precision
value: [0.90957447 0.91891892 0.88359788 0.91351351 0.9144385 0.86387435
0.86772487 0.91489362 0.88947368 0.8994709 ]
mean value: 0.8975480700766527
key: test_recall
value: [0.9 0.95 0.8 0.85 0.80952381 0.85714286
0.85714286 0.95238095 0.7 1. ]
mean value: 0.8676190476190476
key: train_recall
value: [0.92934783 0.92391304 0.9076087 0.91847826 0.93442623 0.90163934
0.89617486 0.93989071 0.91847826 0.92391304]
mean value: 0.9193870277975766
key: test_roc_auc
value: [0.92619048 0.90357143 0.85238095 0.85357143 0.8797619 0.82857143
0.87857143 0.87619048 0.625 0.95 ]
mean value: 0.8573809523809524
key: train_roc_auc
value: [0.91822583 0.92097292 0.89369506 0.91552328 0.92373485 0.8801675
0.88015265 0.92646709 0.90217391 0.91032609]
mean value: 0.9071439177952008
key: test_jcc
value: [0.85714286 0.82608696 0.72727273 0.73913043 0.77272727 0.72
0.7826087 0.8 0.48275862 0.90909091]
mean value: 0.7616818473879943
key: train_jcc
value: [0.85074627 0.85427136 0.81067961 0.845 0.85929648 0.78947368
0.78846154 0.86432161 0.82439024 0.83743842]
mean value: 0.8324079217763206
MCC on Blind test: 0.63
Accuracy on Blind test: 0.83
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04052091 0.04298687 0.05086136 0.05352783 0.04348326 0.04401159
0.04306316 0.05227876 0.05232954 0.053303 ]
mean value: 0.04763662815093994
key: score_time
value: [0.01334476 0.01325011 0.0122745 0.01240516 0.02166629 0.01318216
0.01324201 0.01337886 0.01333618 0.0133636 ]
mean value: 0.013944363594055176
key: test_mcc
value: [0.63990931 0.71955846 0.69383117 0.76905945 0.8049036 0.74951538
0.82027988 0.7306455 0.76989735 0.6634888 ]
mean value: 0.7361088913855512
key: train_mcc
value: [0.78114195 0.78136227 0.78586682 0.79445673 0.76450976 0.78590099
0.78943823 0.77028847 0.78900234 0.77715074]
mean value: 0.7819118302880803
key: test_accuracy
value: [0.81981982 0.85585586 0.84684685 0.88288288 0.9009009 0.87387387
0.90990991 0.86486486 0.88181818 0.82727273]
mean value: 0.8664045864045864
key: train_accuracy
value: [0.88966901 0.88966901 0.89167503 0.89568706 0.88164493 0.89167503
0.89368104 0.88465396 0.89378758 0.88777555]
mean value: 0.8899918191448092
key: test_fscore
value: [0.81481481 0.86440678 0.84684685 0.88695652 0.90598291 0.87931034
0.9122807 0.86956522 0.88888889 0.84033613]
mean value: 0.8709389156360662
key: train_fscore
value: [0.89341085 0.89361702 0.89595376 0.90019194 0.88476562 0.8957529
0.89728682 0.88736533 0.89688716 0.89126214]
mean value: 0.8936493535818284
key: test_precision
value: [0.83018868 0.80952381 0.83928571 0.85 0.86885246 0.85
0.89655172 0.84745763 0.83870968 0.78125 ]
mean value: 0.841181969074713
key: train_precision
value: [0.86491557 0.8635514 0.86270872 0.86372007 0.86121673 0.86245353
0.8670412 0.86615679 0.87145558 0.86440678]
mean value: 0.8647626371740085
key: test_recall
value: [0.8 0.92727273 0.85454545 0.92727273 0.94642857 0.91071429
0.92857143 0.89285714 0.94545455 0.90909091]
mean value: 0.9042207792207793
key: train_recall
value: [0.9238477 0.9258517 0.93186373 0.93987976 0.90963855 0.93172691
0.92971888 0.90963855 0.9238477 0.91983968]
mean value: 0.9245853152087308
key: test_roc_auc
value: [0.81964286 0.85649351 0.84691558 0.88327922 0.90048701 0.87353896
0.90974026 0.86461039 0.88181818 0.82727273]
mean value: 0.8663798701298701
key: train_roc_auc
value: [0.88963469 0.88963268 0.89163467 0.89564269 0.88167298 0.89171516
0.89371715 0.884679 0.89378758 0.88777555]
mean value: 0.8899892153785482
key: test_jcc
value: [0.6875 0.76119403 0.734375 0.796875 0.828125 0.78461538
0.83870968 0.76923077 0.8 0.72463768]
mean value: 0.7725262542275675
key: train_jcc
value: [0.80735552 0.80769231 0.81151832 0.81849913 0.79334501 0.81118881
0.81370826 0.79753521 0.81305115 0.80385289]
mean value: 0.8077746603706929
MCC on Blind test: 0.64
Accuracy on Blind test: 0.84
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.037637 1.08531857 1.08370805 1.19189143 1.05426836 1.07793021
0.96841478 1.15566444 1.01701474 1.18346882]
mean value: 1.0855316400527955
key: score_time
value: [0.01516891 0.01343751 0.01641273 0.01554537 0.01356006 0.0157094
0.01250768 0.01551104 0.01563716 0.01549435]
mean value: 0.014898419380187988
key: test_mcc
value: [0.69373177 0.78859019 0.78376623 0.76698119 0.83897362 0.78420577
0.82447186 0.74951538 0.7823356 0.5938157 ]
mean value: 0.7606387310061538
key: train_mcc
value: [0.87069787 0.85462868 0.85611966 0.84294292 0.8064405 0.84259254
0.86623116 0.7977593 0.84675474 0.84990593]
mean value: 0.8434073301854806
key: test_accuracy
value: [0.84684685 0.89189189 0.89189189 0.88288288 0.91891892 0.89189189
0.90990991 0.87387387 0.89090909 0.79090909]
mean value: 0.878992628992629
key: train_accuracy
value: [0.93480441 0.92678034 0.92778335 0.92076229 0.90270812 0.92076229
0.9327984 0.89869609 0.92284569 0.9248497 ]
mean value: 0.9212790676639135
key: test_fscore
value: [0.8440367 0.89655172 0.89090909 0.88495575 0.92173913 0.89473684
0.91525424 0.87931034 0.89285714 0.80991736]
mean value: 0.8830268317391928
key: train_fscore
value: [0.93646139 0.92864125 0.92913386 0.92307692 0.9049951 0.92262488
0.93399015 0.90009891 0.92473118 0.92566898]
mean value: 0.92294226227868
key: test_precision
value: [0.85185185 0.85245902 0.89090909 0.86206897 0.89830508 0.87931034
0.87096774 0.85 0.87719298 0.74242424]
mean value: 0.8575489321060842
key: train_precision
value: [0.91412214 0.90648855 0.91295938 0.89772727 0.8833652 0.90057361
0.91682785 0.88693957 0.90267176 0.91568627]
mean value: 0.9037361609709368
key: test_recall
value: [0.83636364 0.94545455 0.89090909 0.90909091 0.94642857 0.91071429
0.96428571 0.91071429 0.90909091 0.89090909]
mean value: 0.9113961038961038
key: train_recall
value: [0.95991984 0.95190381 0.94589178 0.9498998 0.92771084 0.94578313
0.95180723 0.91365462 0.94789579 0.93587174]
mean value: 0.9430338588824234
key: test_roc_auc
value: [0.84675325 0.89237013 0.89188312 0.88311688 0.91866883 0.89172078
0.90941558 0.87353896 0.89090909 0.79090909]
mean value: 0.8789285714285714
key: train_roc_auc
value: [0.9347792 0.92675512 0.92776517 0.92073303 0.90273318 0.92078736
0.93281744 0.89871108 0.92284569 0.9248497 ]
mean value: 0.9212776959541573
key: test_jcc
value: [0.73015873 0.8125 0.80327869 0.79365079 0.85483871 0.80952381
0.84375 0.78461538 0.80645161 0.68055556]
mean value: 0.7919323284609509
key: train_jcc
value: [0.88051471 0.86678832 0.86764706 0.85714286 0.82647585 0.85636364
0.87615527 0.81834532 0.86 0.86162362]
mean value: 0.8571056637111273
MCC on Blind test: 0.66
Accuracy on Blind test: 0.86
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01727653 0.01254821 0.01245856 0.01221752 0.01246524 0.01370215
0.01368165 0.01427627 0.01377892 0.01384521]
mean value: 0.013625025749206543
key: score_time
value: [0.01262164 0.00979233 0.00930595 0.00934267 0.01013041 0.01029086
0.01028466 0.01063132 0.01032662 0.01033711]
mean value: 0.010306358337402344
key: test_mcc
value: [0.5928164 0.49641957 0.50003497 0.42337662 0.75189742 0.53770284
0.58571429 0.35400098 0.58191437 0.47343208]
mean value: 0.5297309536091728
key: train_mcc
value: [0.53906794 0.53171804 0.55439159 0.57014286 0.55399337 0.55772149
0.53001234 0.54540938 0.56790568 0.56939559]
mean value: 0.5519758288087392
key: test_accuracy
value: [0.79279279 0.74774775 0.74774775 0.71171171 0.87387387 0.76576577
0.79279279 0.67567568 0.79090909 0.73636364]
mean value: 0.7635380835380835
key: train_accuracy
value: [0.76930792 0.76529589 0.77632899 0.78435306 0.77632899 0.77833501
0.76429288 0.77231695 0.78356713 0.78456914]
mean value: 0.7754695951582201
key: test_fscore
value: [0.77227723 0.73584906 0.7254902 0.70909091 0.88135593 0.75
0.79279279 0.66037736 0.78899083 0.74336283]
mean value: 0.7559587130529115
key: train_fscore
value: [0.76482618 0.75776398 0.76746611 0.77673936 0.76795005 0.77098446
0.75495308 0.76573787 0.77777778 0.78128179]
mean value: 0.7685480644155679
key: test_precision
value: [0.84782609 0.76470588 0.78723404 0.70909091 0.83870968 0.8125
0.8 0.7 0.7962963 0.72413793]
mean value: 0.7780500825703698
key: train_precision
value: [0.78079332 0.78372591 0.8 0.80603448 0.79697624 0.79657388
0.78524946 0.78768577 0.79915433 0.79338843]
mean value: 0.7929581826379648
key: test_recall
value: [0.70909091 0.70909091 0.67272727 0.70909091 0.92857143 0.69642857
0.78571429 0.625 0.78181818 0.76363636]
mean value: 0.7381168831168832
key: train_recall
value: [0.749499 0.73346693 0.73747495 0.749499 0.74096386 0.74698795
0.72690763 0.74497992 0.75751503 0.76953908]
mean value: 0.7456833345405671
key: test_roc_auc
value: [0.79204545 0.7474026 0.74707792 0.71168831 0.87337662 0.7663961
0.79285714 0.67613636 0.79090909 0.73636364]
mean value: 0.7634253246753246
key: train_roc_auc
value: [0.76932781 0.76532784 0.776368 0.78438805 0.77629355 0.7783036
0.76425542 0.77228956 0.78356713 0.78456914]
mean value: 0.7754690103097762
key: test_jcc
value: [0.62903226 0.58208955 0.56923077 0.54929577 0.78787879 0.6
0.65671642 0.49295775 0.65151515 0.5915493 ]
mean value: 0.6110265753739887
key: train_jcc
value: [0.6192053 0.61 0.62267343 0.63497453 0.62331081 0.62731872
0.60636516 0.62040134 0.63636364 0.64106845]
mean value: 0.6241681375865916
MCC on Blind test: 0.44
Accuracy on Blind test: 0.76
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0152173 0.01704001 0.01721168 0.02113652 0.01699519 0.01698351
0.01694894 0.01697063 0.0170331 0.01704311]
mean value: 0.017258000373840333
key: score_time
value: [0.0123899 0.01239848 0.01244283 0.0125103 0.01237535 0.01237679
0.01236582 0.01238203 0.01243782 0.01241136]
mean value: 0.012409067153930664
key: test_mcc
value: [0.60674852 0.53168696 0.51517746 0.56873266 0.71205754 0.51517746
0.67598342 0.58620801 0.73323558 0.56400939]
mean value: 0.6009016993199549
key: train_mcc
value: [0.63491093 0.63543061 0.62142983 0.61525929 0.6391103 0.66099253
0.62317718 0.65531326 0.65761181 0.63728607]
mean value: 0.6380521801166391
key: test_accuracy
value: [0.8018018 0.76576577 0.75675676 0.78378378 0.85585586 0.75675676
0.83783784 0.79279279 0.86363636 0.78181818]
mean value: 0.7996805896805896
key: train_accuracy
value: [0.81745236 0.81745236 0.81043129 0.80742227 0.81945838 0.83049147
0.8114343 0.82748245 0.82865731 0.81863727]
mean value: 0.8188919463802228
key: test_fscore
value: [0.78846154 0.75925926 0.74285714 0.77358491 0.85964912 0.76923077
0.84210526 0.8 0.87179487 0.78571429]
mean value: 0.7992657158943157
key: train_fscore
value: [0.81726908 0.81390593 0.80655067 0.80408163 0.82142857 0.82980866
0.80816327 0.83003953 0.83119447 0.81809045]
mean value: 0.818053225190839
key: test_precision
value: [0.83673469 0.77358491 0.78 0.80392157 0.84482759 0.73770492
0.82758621 0.77966102 0.82258065 0.77192982]
mean value: 0.7978531365973461
key: train_precision
value: [0.81891348 0.8308977 0.82426778 0.81912682 0.81176471 0.83232323
0.82157676 0.81712062 0.81906615 0.82056452]
mean value: 0.821562177423608
key: test_recall
value: [0.74545455 0.74545455 0.70909091 0.74545455 0.875 0.80357143
0.85714286 0.82142857 0.92727273 0.8 ]
mean value: 0.802987012987013
key: train_recall
value: [0.81563126 0.79759519 0.78957916 0.78957916 0.8313253 0.82730924
0.79518072 0.84337349 0.84368737 0.81563126]
mean value: 0.8148892161833707
key: test_roc_auc
value: [0.8012987 0.76558442 0.75633117 0.78344156 0.85568182 0.75633117
0.83766234 0.79253247 0.86363636 0.78181818]
mean value: 0.7994318181818182
key: train_roc_auc
value: [0.81745419 0.81747229 0.81045223 0.80744018 0.81947027 0.83048829
0.81141802 0.82749837 0.82865731 0.81863727]
mean value: 0.8188988418604277
key: test_jcc
value: [0.65079365 0.6119403 0.59090909 0.63076923 0.75384615 0.625
0.72727273 0.66666667 0.77272727 0.64705882]
mean value: 0.6676983915021667
key: train_jcc
value: [0.6910017 0.6862069 0.67581475 0.67235495 0.6969697 0.7091222
0.67808219 0.70945946 0.71114865 0.69217687]
mean value: 0.6922337365141537
MCC on Blind test: 0.5
Accuracy on Blind test: 0.79
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01553726 0.0123558 0.01168871 0.0119946 0.01135278 0.01145506
0.01270914 0.01214385 0.01241255 0.01243854]
mean value: 0.012408828735351563
key: score_time
value: [0.03279448 0.01701331 0.01946092 0.01539755 0.01587725 0.01698089
0.01995373 0.01609039 0.01584721 0.01604939]
mean value: 0.018546509742736816
key: test_mcc
value: [0.65081289 0.49545455 0.48459368 0.53318254 0.62382476 0.61299389
0.72309474 0.49561285 0.5731902 0.56400939]
mean value: 0.5756769480530834
key: train_mcc
value: [0.73735103 0.69400592 0.73861246 0.7497858 0.73622878 0.74649978
0.73511426 0.72317842 0.72360009 0.74076811]
mean value: 0.7325144649619348
key: test_accuracy
value: [0.81981982 0.74774775 0.73873874 0.76576577 0.81081081 0.8018018
0.85585586 0.74774775 0.78181818 0.78181818]
mean value: 0.7851924651924652
key: train_accuracy
value: [0.8665998 0.84553661 0.86760281 0.87261785 0.8665998 0.87061184
0.86459378 0.86058175 0.85871743 0.86873747]
mean value: 0.8642199142517734
key: test_fscore
value: [0.83333333 0.74545455 0.75630252 0.77192982 0.82051282 0.81967213
0.86885246 0.75438596 0.8 0.78571429]
mean value: 0.7956157885661007
key: train_fscore
value: [0.87345385 0.85249042 0.87380497 0.87939221 0.87223823 0.87772512
0.87252125 0.86544046 0.86735654 0.87464115]
mean value: 0.8709064207475893
key: test_precision
value: [0.76923077 0.74545455 0.703125 0.74576271 0.78688525 0.75757576
0.8030303 0.74137931 0.73846154 0.77192982]
mean value: 0.7562835006425191
key: train_precision
value: [0.83152174 0.81651376 0.83546618 0.83574007 0.83609576 0.83123878
0.82352941 0.83551402 0.81737589 0.83699634]
mean value: 0.8299991949383702
key: test_recall
value: [0.90909091 0.74545455 0.81818182 0.8 0.85714286 0.89285714
0.94642857 0.76785714 0.87272727 0.8 ]
mean value: 0.8409740259740259
key: train_recall
value: [0.91983968 0.89178357 0.91583166 0.92785571 0.91164659 0.92971888
0.92771084 0.89759036 0.9238477 0.91583166]
mean value: 0.9161656646626587
key: test_roc_auc
value: [0.82061688 0.74772727 0.73944805 0.76607143 0.81038961 0.80097403
0.85503247 0.74756494 0.78181818 0.78181818]
mean value: 0.785146103896104
key: train_roc_auc
value: [0.86654635 0.84549018 0.86755439 0.87256239 0.86664494 0.87067106
0.86465702 0.86061883 0.85871743 0.86873747]
mean value: 0.8642200062776154
key: test_jcc
value: [0.71428571 0.5942029 0.60810811 0.62857143 0.69565217 0.69444444
0.76811594 0.6056338 0.66666667 0.64705882]
mean value: 0.6622740002915428
key: train_jcc
value: [0.77533784 0.74290484 0.77589134 0.78474576 0.77342419 0.78209459
0.77386935 0.76279863 0.76578073 0.77721088]
mean value: 0.7714058165400389
MCC on Blind test: 0.43
Accuracy on Blind test: 0.74
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.05147576 0.05113029 0.05149794 0.05157065 0.05220318 0.05111527
0.05288601 0.05138397 0.05986619 0.06046939]
mean value: 0.05335986614227295
key: score_time
value: [0.01986051 0.01984072 0.01992202 0.01969242 0.01982999 0.01954079
0.01995969 0.0201807 0.0201447 0.01967216]
mean value: 0.019864368438720702
key: test_mcc
value: [0.71224427 0.66693232 0.65875884 0.69483296 0.80802876 0.64590588
0.7306455 0.67932297 0.77407027 0.67363307]
mean value: 0.7044374854667038
key: train_mcc
value: [0.74644043 0.7314568 0.75527303 0.74901457 0.73276182 0.76010338
0.74112632 0.75105359 0.74135713 0.76031401]
mean value: 0.7468901068438255
key: test_accuracy
value: [0.85585586 0.82882883 0.82882883 0.84684685 0.9009009 0.81981982
0.86486486 0.83783784 0.88181818 0.82727273]
mean value: 0.8492874692874692
key: train_accuracy
value: [0.87061184 0.86359077 0.87462387 0.87161484 0.86459378 0.8776329
0.86860582 0.87362086 0.86873747 0.87775551]
mean value: 0.87113876700241
key: test_fscore
value: [0.85714286 0.84033613 0.83185841 0.84955752 0.90756303 0.83333333
0.86956522 0.84745763 0.8907563 0.84552846]
mean value: 0.8573098881659105
key: train_fscore
value: [0.87795648 0.87072243 0.88218662 0.87924528 0.8708134 0.88403042
0.87488061 0.87954111 0.87511916 0.88425047]
mean value: 0.877874598461022
key: test_precision
value: [0.84210526 0.78125 0.81034483 0.82758621 0.85714286 0.78125
0.84745763 0.80645161 0.828125 0.76470588]
mean value: 0.8146419277158321
key: train_precision
value: [0.83154122 0.82820976 0.83274021 0.83065954 0.83180987 0.83935018
0.83424408 0.83941606 0.83454545 0.83963964]
mean value: 0.8342156018881279
key: test_recall
value: [0.87272727 0.90909091 0.85454545 0.87272727 0.96428571 0.89285714
0.89285714 0.89285714 0.96363636 0.94545455]
mean value: 0.9061038961038961
key: train_recall
value: [0.92985972 0.91783567 0.93787575 0.93386774 0.91365462 0.93373494
0.91967871 0.92369478 0.91983968 0.93386774]
mean value: 0.9263909344794006
key: test_roc_auc
value: [0.85600649 0.82954545 0.82905844 0.84707792 0.90032468 0.81915584
0.86461039 0.83733766 0.88181818 0.82727273]
mean value: 0.8492207792207792
key: train_roc_auc
value: [0.87055235 0.86353631 0.87456037 0.87155234 0.86464294 0.87768911
0.86865699 0.87367104 0.86873747 0.87775551]
mean value: 0.8711354435779189
key: test_jcc
value: [0.75 0.72463768 0.71212121 0.73846154 0.83076923 0.71428571
0.76923077 0.73529412 0.8030303 0.73239437]
mean value: 0.7510224932902431
key: train_jcc
value: [0.78246206 0.77104377 0.78920742 0.78451178 0.77118644 0.79216354
0.77758913 0.78498294 0.7779661 0.79251701]
mean value: 0.7823630194686007
MCC on Blind test: 0.63
Accuracy on Blind test: 0.83
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [3.41872621 3.53579807 3.53889394 3.50232983 3.69363928 3.61123085
1.48464918 3.60396457 3.29749489 3.37618065]
mean value: 3.3062907457351685
key: score_time
value: [0.01836133 0.01513553 0.01684785 0.01543593 0.01529264 0.0153451
0.01301169 0.02348351 0.01276898 0.02390289]
mean value: 0.01695854663848877
key: test_mcc
value: [0.91368563 0.89249761 0.85644694 0.76905945 0.88077101 0.87733514
0.82824452 0.83897362 0.89625816 0.74848119]
mean value: 0.8501753280181599
key: train_mcc
value: [0.99799599 1. 0.99398393 0.99198394 0.99599599 0.99599599
0.87092746 0.99599599 0.99400594 0.997998 ]
mean value: 0.983488322406025
key: test_accuracy
value: [0.95495495 0.94594595 0.92792793 0.88288288 0.93693694 0.93693694
0.90990991 0.91891892 0.94545455 0.86363636]
mean value: 0.9223505323505323
key: train_accuracy
value: [0.99899699 1. 0.99699097 0.99598796 0.99799398 0.99799398
0.9327984 0.99799398 0.99699399 0.998998 ]
mean value: 0.9914748252774355
key: test_fscore
value: [0.95652174 0.94642857 0.92857143 0.88695652 0.94117647 0.94017094
0.91666667 0.92173913 0.94827586 0.87804878]
mean value: 0.926455611128696
key: train_fscore
value: [0.99899699 1. 0.996997 0.99598394 0.99799599 0.99799599
0.93625119 0.99799599 0.99698492 0.998999 ]
mean value: 0.9918201012630389
key: test_precision
value: [0.91666667 0.92982456 0.9122807 0.85 0.88888889 0.90163934
0.859375 0.89830508 0.90163934 0.79411765]
mean value: 0.8852737239042626
key: train_precision
value: [1. 1. 0.996 0.99798793 0.996 0.996
0.88969259 0.996 1. 0.998 ]
mean value: 0.986968051346051
key: test_recall
value: [1. 0.96363636 0.94545455 0.92727273 1. 0.98214286
0.98214286 0.94642857 1. 0.98181818]
mean value: 0.9728896103896104
key: train_recall
value: [0.99799599 1. 0.99799599 0.99398798 1. 1.
0.98795181 1. 0.99398798 1. ]
mean value: 0.9971919743100659
key: test_roc_auc
value: [0.95535714 0.9461039 0.92808442 0.88327922 0.93636364 0.93652597
0.90925325 0.91866883 0.94545455 0.86363636]
mean value: 0.9222727272727272
key: train_roc_auc
value: [0.998998 1. 0.99698996 0.99598997 0.99799599 0.99799599
0.93285366 0.99799599 0.99699399 0.998998 ]
mean value: 0.9914811550812468
key: test_jcc
value: [0.91666667 0.89830508 0.86666667 0.796875 0.88888889 0.88709677
0.84615385 0.85483871 0.90163934 0.7826087 ]
mean value: 0.8639739676907268
key: train_jcc
value: [0.99799599 1. 0.99401198 0.992 0.996 0.996
0.88014311 0.996 0.99398798 0.998 ]
mean value: 0.9844139056685028
MCC on Blind test: 0.58
Accuracy on Blind test: 0.83
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.05418754 0.04911709 0.04165864 0.04315972 0.05047321 0.0487411
0.04731131 0.04764485 0.04171062 0.0402627 ]
mean value: 0.04642667770385742
key: score_time
value: [0.00953841 0.00912523 0.00918102 0.00916767 0.00922489 0.00934124
0.0091722 0.00921059 0.00914741 0.00930452]
mean value: 0.009241318702697754
key: test_mcc
value: [0.94735177 0.89188312 0.84137254 0.91368563 0.91355091 0.89414155
0.87398511 0.89414155 0.94686415 0.94686415]
mean value: 0.9063840476442566
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97297297 0.94594595 0.91891892 0.95495495 0.95495495 0.94594595
0.93693694 0.94594595 0.97272727 0.97272727]
mean value: 0.9522031122031123
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97345133 0.94545455 0.92173913 0.95652174 0.95726496 0.94827586
0.9380531 0.94827586 0.97345133 0.97345133]
mean value: 0.9535939176068668
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94827586 0.94545455 0.88333333 0.91666667 0.91803279 0.91666667
0.92982456 0.91666667 0.94827586 0.94827586]
mean value: 0.927147281328353
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94545455 0.96363636 1. 1. 0.98214286
0.94642857 0.98214286 1. 1. ]
mean value: 0.9819805194805195
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97321429 0.94594156 0.91931818 0.95535714 0.95454545 0.94561688
0.93685065 0.94561688 0.97272727 0.97272727]
mean value: 0.9521915584415583
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94827586 0.89655172 0.85483871 0.91666667 0.91803279 0.90163934
0.88333333 0.90163934 0.94827586 0.94827586]
mean value: 0.9117529495432083
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.66
Accuracy on Blind test: 0.87
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.17229462 0.1718049 0.17255998 0.17445517 0.17378926 0.17340136
0.17458105 0.17313957 0.17255402 0.17457175]
mean value: 0.173315167427063
key: score_time
value: [0.01905179 0.01922989 0.01923728 0.01930737 0.01945829 0.01919985
0.0191958 0.01988244 0.01879358 0.02033043]
mean value: 0.019368672370910646
key: test_mcc
value: [0.94735177 0.91003577 0.91127765 0.96396104 0.88077101 0.89704631
0.89242811 0.89242811 0.89625816 0.83984125]
mean value: 0.9031399193828875
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97297297 0.95495495 0.95495495 0.98198198 0.93693694 0.94594595
0.94594595 0.94594595 0.94545455 0.91818182]
mean value: 0.9503276003276003
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97345133 0.95412844 0.95575221 0.98181818 0.94117647 0.94915254
0.94736842 0.94736842 0.94827586 0.92173913]
mean value: 0.952023100957829
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94827586 0.96296296 0.93103448 0.98181818 0.88888889 0.90322581
0.93103448 0.93103448 0.90163934 0.88333333]
mean value: 0.9263247828062102
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94545455 0.98181818 0.98181818 1. 1.
0.96428571 0.96428571 1. 0.96363636]
mean value: 0.9801298701298702
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97321429 0.95487013 0.95519481 0.98198052 0.93636364 0.94545455
0.94577922 0.94577922 0.94545455 0.91818182]
mean value: 0.9502272727272727
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94827586 0.9122807 0.91525424 0.96428571 0.88888889 0.90322581
0.9 0.9 0.90163934 0.85483871]
mean value: 0.9088689264677418
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.5
Accuracy on Blind test: 0.81
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01375628 0.0127883 0.0128231 0.01299429 0.01285195 0.01358485
0.0128386 0.01285863 0.01297832 0.01316524]
mean value: 0.013063955307006835
key: score_time
value: [0.00929976 0.00993133 0.00923967 0.009238 0.00926399 0.00926995
0.00940943 0.00925875 0.00927234 0.00982141]
mean value: 0.009400463104248047
key: test_mcc
value: [0.88102763 0.77216596 0.72112155 0.84137254 0.70489656 0.75979502
0.80802876 0.82447186 0.7793831 0.69564113]
mean value: 0.7787904107892152
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93693694 0.88288288 0.84684685 0.91891892 0.83783784 0.87387387
0.9009009 0.90990991 0.88181818 0.83636364]
mean value: 0.8826289926289926
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94017094 0.88888889 0.864 0.92173913 0.859375 0.8852459
0.90756303 0.91525424 0.89256198 0.85483871]
mean value: 0.8929637816780669
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88709677 0.83870968 0.77142857 0.88333333 0.76388889 0.81818182
0.85714286 0.87096774 0.81818182 0.76811594]
mean value: 0.827704742273466
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.94545455 0.98181818 0.96363636 0.98214286 0.96428571
0.96428571 0.96428571 0.98181818 0.96363636]
mean value: 0.9711363636363637
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.88344156 0.84805195 0.91931818 0.83652597 0.87305195
0.90032468 0.90941558 0.88181818 0.83636364]
mean value: 0.8825811688311689
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88709677 0.8 0.76056338 0.85483871 0.75342466 0.79411765
0.83076923 0.84375 0.80597015 0.74647887]
mean value: 0.8077009422008127
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.45
Accuracy on Blind test: 0.79
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.808599 2.80106878 2.77748418 2.78659678 2.8147788 2.78457665
2.77041578 2.77824712 2.79634237 2.76898098]
mean value: 2.788709044456482
key: score_time
value: [0.09930849 0.09889126 0.09994721 0.10435081 0.09950209 0.09872508
0.1585269 0.10039687 0.09920406 0.09858346]
mean value: 0.1057436227798462
key: test_mcc
value: [0.96459895 0.96396104 0.85816689 0.94735177 0.93029809 0.89704631
0.87508299 0.91003577 0.87988269 0.84322091]
mean value: 0.9069645418161864
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98198198 0.98198198 0.92792793 0.97297297 0.96396396 0.94594595
0.93693694 0.95495495 0.93636364 0.91818182]
mean value: 0.9521212121212121
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98214286 0.98181818 0.92982456 0.97345133 0.96551724 0.94915254
0.93913043 0.95575221 0.94017094 0.92307692]
mean value: 0.954003722197022
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96491228 0.98181818 0.89830508 0.94827586 0.93333333 0.90322581
0.91525424 0.94736842 0.88709677 0.87096774]
mean value: 0.925055772358941
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.98181818 0.96363636 1. 1. 1.
0.96428571 0.96428571 1. 0.98181818]
mean value: 0.9855844155844156
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.98198052 0.92824675 0.97321429 0.96363636 0.94545455
0.93668831 0.95487013 0.93636364 0.91818182]
mean value: 0.9520779220779221
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96491228 0.96428571 0.86885246 0.94827586 0.93333333 0.90322581
0.8852459 0.91525424 0.88709677 0.85714286]
mean value: 0.912762522612166
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.89
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.13378525 1.13587284 1.13098383 1.13388467 1.16668534 1.18749261
1.24799681 1.14120936 1.1288693 1.20383883]
mean value: 1.1610618829727173
key: score_time
value: [0.23726964 0.2293191 0.28423762 0.23544788 0.28838253 0.29559493
0.28117895 0.27506924 0.28288841 0.22672176]
mean value: 0.26361100673675536
key: test_mcc
value: [0.9461039 0.91006494 0.82205752 0.89188312 0.93029809 0.85798501
0.91355091 0.89242811 0.83984125 0.87635609]
mean value: 0.888056893772736
key: train_mcc
value: [0.96427099 0.95418486 0.95040062 0.95819837 0.96025947 0.94616064
0.96221384 0.95213257 0.95209501 0.95824107]
mean value: 0.9558157467241123
key: test_accuracy
value: [0.97297297 0.95495495 0.90990991 0.94594595 0.96396396 0.92792793
0.95495495 0.94594595 0.91818182 0.93636364]
mean value: 0.9431122031122031
key: train_accuracy
value: [0.98194584 0.97693079 0.97492477 0.97893681 0.97993982 0.97291876
0.98094283 0.97592778 0.9759519 0.97895792]
mean value: 0.9777377221845899
key: test_fscore
value: [0.97297297 0.95495495 0.9122807 0.94545455 0.96551724 0.93103448
0.95726496 0.94736842 0.92173913 0.93913043]
mean value: 0.9447717842809771
key: train_fscore
value: [0.98221344 0.97725025 0.97536946 0.97922849 0.98019802 0.97324083
0.98116947 0.97619048 0.97619048 0.97922849]
mean value: 0.9780279396854765
key: test_precision
value: [0.96428571 0.94642857 0.88135593 0.94545455 0.93333333 0.9
0.91803279 0.93103448 0.88333333 0.9 ]
mean value: 0.9203258699682755
key: train_precision
value: [0.96881092 0.96484375 0.95930233 0.96679688 0.96679688 0.96086106
0.96868885 0.96470588 0.96660118 0.96679688]
mean value: 0.9654204580048241
key: test_recall
value: [0.98181818 0.96363636 0.94545455 0.94545455 1. 0.96428571
1. 0.96428571 0.96363636 0.98181818]
mean value: 0.971038961038961
key: train_recall
value: [0.99599198 0.98997996 0.99198397 0.99198397 0.9939759 0.98594378
0.9939759 0.98795181 0.98597194 0.99198397]
mean value: 0.9909743181141399
key: test_roc_auc
value: [0.97305195 0.95503247 0.91022727 0.94594156 0.96363636 0.9275974
0.95454545 0.94577922 0.91818182 0.93636364]
mean value: 0.9430357142857143
key: train_roc_auc
value: [0.98193173 0.97691769 0.97490765 0.97892371 0.97995388 0.97293181
0.98095589 0.97593983 0.9759519 0.97895792]
mean value: 0.977737201310251
key: test_jcc
value: [0.94736842 0.9137931 0.83870968 0.89655172 0.93333333 0.87096774
0.91803279 0.9 0.85483871 0.8852459 ]
mean value: 0.8958841399529021
key: train_jcc
value: [0.96504854 0.95551257 0.95192308 0.95930233 0.96116505 0.94787645
0.96303502 0.95348837 0.95348837 0.95930233]
mean value: 0.9570142104370474
MCC on Blind test: 0.78
Accuracy on Blind test: 0.91
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02807689 0.01736927 0.01711869 0.01721525 0.01714611 0.01708817
0.01749229 0.01700449 0.01726103 0.01719761]
mean value: 0.018296980857849122
key: score_time
value: [0.01289463 0.01240635 0.01258349 0.01251745 0.01250124 0.01252055
0.01249766 0.01253819 0.01249957 0.01250172]
mean value: 0.012546086311340332
key: test_mcc
value: [0.60674852 0.53168696 0.51517746 0.56873266 0.71205754 0.51517746
0.67598342 0.58620801 0.73323558 0.56400939]
mean value: 0.6009016993199549
key: train_mcc
value: [0.63491093 0.63543061 0.62142983 0.61525929 0.6391103 0.66099253
0.62317718 0.65531326 0.65761181 0.63728607]
mean value: 0.6380521801166391
key: test_accuracy
value: [0.8018018 0.76576577 0.75675676 0.78378378 0.85585586 0.75675676
0.83783784 0.79279279 0.86363636 0.78181818]
mean value: 0.7996805896805896
key: train_accuracy
value: [0.81745236 0.81745236 0.81043129 0.80742227 0.81945838 0.83049147
0.8114343 0.82748245 0.82865731 0.81863727]
mean value: 0.8188919463802228
key: test_fscore
value: [0.78846154 0.75925926 0.74285714 0.77358491 0.85964912 0.76923077
0.84210526 0.8 0.87179487 0.78571429]
mean value: 0.7992657158943157
key: train_fscore
value: [0.81726908 0.81390593 0.80655067 0.80408163 0.82142857 0.82980866
0.80816327 0.83003953 0.83119447 0.81809045]
mean value: 0.818053225190839
key: test_precision
value: [0.83673469 0.77358491 0.78 0.80392157 0.84482759 0.73770492
0.82758621 0.77966102 0.82258065 0.77192982]
mean value: 0.7978531365973461
key: train_precision
value: [0.81891348 0.8308977 0.82426778 0.81912682 0.81176471 0.83232323
0.82157676 0.81712062 0.81906615 0.82056452]
mean value: 0.821562177423608
key: test_recall
value: [0.74545455 0.74545455 0.70909091 0.74545455 0.875 0.80357143
0.85714286 0.82142857 0.92727273 0.8 ]
mean value: 0.802987012987013
key: train_recall
value: [0.81563126 0.79759519 0.78957916 0.78957916 0.8313253 0.82730924
0.79518072 0.84337349 0.84368737 0.81563126]
mean value: 0.8148892161833707
key: test_roc_auc
value: [0.8012987 0.76558442 0.75633117 0.78344156 0.85568182 0.75633117
0.83766234 0.79253247 0.86363636 0.78181818]
mean value: 0.7994318181818182
key: train_roc_auc
value: [0.81745419 0.81747229 0.81045223 0.80744018 0.81947027 0.83048829
0.81141802 0.82749837 0.82865731 0.81863727]
mean value: 0.8188988418604277
key: test_jcc
value: [0.65079365 0.6119403 0.59090909 0.63076923 0.75384615 0.625
0.72727273 0.66666667 0.77272727 0.64705882]
mean value: 0.6676983915021667
key: train_jcc
value: [0.6910017 0.6862069 0.67581475 0.67235495 0.6969697 0.7091222
0.67808219 0.70945946 0.71114865 0.69217687]
mean value: 0.6922337365141537
MCC on Blind test: 0.5
Accuracy on Blind test: 0.79
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.16017747 0.1352632 0.1334641 0.28510332 0.13021255 0.1371038
0.13196158 0.13735604 0.136549 0.13277578]
mean value: 0.15199668407440187
key: score_time
value: [0.01133108 0.01133108 0.01121116 0.01118255 0.01125717 0.01123095
0.01119471 0.01144171 0.01123309 0.01124144]
mean value: 0.01126549243927002
key: test_mcc
value: [0.94735177 0.94735177 0.91127765 0.91368563 0.91355091 0.87733514
0.91119237 0.96457634 0.87988269 0.91287093]
mean value: 0.917907519038506
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.97297297 0.97297297 0.95495495 0.95495495 0.95495495 0.93693694
0.95495495 0.98198198 0.93636364 0.95454545]
mean value: 0.9575593775593776
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97345133 0.97345133 0.95575221 0.95652174 0.95726496 0.94017094
0.95652174 0.98245614 0.94017094 0.95652174]
mean value: 0.9592283062605657
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.94827586 0.94827586 0.93103448 0.91666667 0.91803279 0.90163934
0.93220339 0.96551724 0.88709677 0.91666667]
mean value: 0.9265409076780793
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.98181818 1. 1. 0.98214286
0.98214286 1. 1. 1. ]
mean value: 0.9946103896103896
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97321429 0.97321429 0.95519481 0.95535714 0.95454545 0.93652597
0.95470779 0.98181818 0.93636364 0.95454545]
mean value: 0.9575487012987013
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94827586 0.94827586 0.91525424 0.91666667 0.91803279 0.88709677
0.91666667 0.96551724 0.88709677 0.91666667]
mean value: 0.9219549538077719
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.73
Accuracy on Blind test: 0.89
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05546999 0.08607674 0.0726912 0.08219934 0.06160903 0.08801007
0.0710063 0.07505107 0.10596418 0.08573651]
mean value: 0.07838144302368164
key: score_time
value: [0.01859093 0.01929092 0.01232386 0.01251221 0.01444817 0.01904821
0.01238537 0.01876664 0.04351854 0.02110696]
mean value: 0.019199180603027343
key: test_mcc
value: [0.73090707 0.69483296 0.65875884 0.67619361 0.76868784 0.74772727
0.79177679 0.62617314 0.72739297 0.6634888 ]
mean value: 0.7085939295611897
key: train_mcc
value: [0.83152863 0.79823275 0.82348413 0.82612027 0.80931961 0.82116323
0.81783952 0.80406528 0.81161814 0.82958324]
mean value: 0.817295481744602
key: test_accuracy
value: [0.86486486 0.84684685 0.82882883 0.83783784 0.88288288 0.87387387
0.89189189 0.81081081 0.86363636 0.82727273]
mean value: 0.8528746928746929
key: train_accuracy
value: [0.91474423 0.89869609 0.9107322 0.91173521 0.90371113 0.90972919
0.90772317 0.90170512 0.90480962 0.91382766]
mean value: 0.9077413603536059
key: test_fscore
value: [0.86725664 0.84955752 0.83185841 0.83928571 0.88888889 0.875
0.9 0.82352941 0.86238532 0.84033613]
mean value: 0.8578098036865689
key: train_fscore
value: [0.91771539 0.90107738 0.91384318 0.91522158 0.90679612 0.91245136
0.91102515 0.90354331 0.90803485 0.91666667]
mean value: 0.9106374969508796
key: test_precision
value: [0.84482759 0.82758621 0.81034483 0.8245614 0.85245902 0.875
0.84375 0.77777778 0.87037037 0.78125 ]
mean value: 0.8307927188740017
key: train_precision
value: [0.88764045 0.88122605 0.88389513 0.8812616 0.87781955 0.88490566
0.87873134 0.88610039 0.87827715 0.88742964]
mean value: 0.8827286965430265
key: test_recall
value: [0.89090909 0.87272727 0.85454545 0.85454545 0.92857143 0.875
0.96428571 0.875 0.85454545 0.90909091]
mean value: 0.8879220779220779
key: train_recall
value: [0.9498998 0.92184369 0.94589178 0.95190381 0.937751 0.94176707
0.94578313 0.92168675 0.93987976 0.94789579]
mean value: 0.9404302581065745
key: test_roc_auc
value: [0.8650974 0.84707792 0.82905844 0.83798701 0.88246753 0.87386364
0.89123377 0.81022727 0.86363636 0.82727273]
mean value: 0.8527922077922078
key: train_roc_auc
value: [0.91470894 0.89867285 0.9106969 0.91169488 0.90374524 0.90976129
0.90776131 0.90172514 0.90480962 0.91382766]
mean value: 0.9077403803591118
key: test_jcc
value: [0.765625 0.73846154 0.71212121 0.72307692 0.8 0.77777778
0.81818182 0.7 0.75806452 0.72463768]
mean value: 0.7517946466907722
key: train_jcc
value: [0.84794275 0.81996435 0.84135472 0.84369449 0.8294849 0.83899821
0.8365897 0.82405745 0.83156028 0.84615385]
mean value: 0.8359800713703212
MCC on Blind test: 0.59
Accuracy on Blind test: 0.83
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02062082 0.01450729 0.01637959 0.01642632 0.01640606 0.01634121
0.01633549 0.01689839 0.01637173 0.0293107 ]
mean value: 0.01795976161956787
key: score_time
value: [0.01051903 0.0122385 0.01214838 0.0124464 0.01239777 0.01236987
0.01209283 0.01214767 0.01245856 0.01252127]
mean value: 0.012134027481079102
key: test_mcc
value: [0.60383519 0.55139323 0.4778799 0.56873266 0.74772727 0.53149351
0.6576811 0.60540128 0.63678479 0.52762168]
mean value: 0.5908550608068347
key: train_mcc
value: [0.59486707 0.60893242 0.60503014 0.59892457 0.58902021 0.60720182
0.58895544 0.59886665 0.60359924 0.59950665]
mean value: 0.5994904218517351
key: test_accuracy
value: [0.8018018 0.77477477 0.73873874 0.78378378 0.87387387 0.76576577
0.82882883 0.8018018 0.81818182 0.76363636]
mean value: 0.7951187551187551
key: train_accuracy
value: [0.79739218 0.80441324 0.80240722 0.79939819 0.79438315 0.80341023
0.79438315 0.79939819 0.80160321 0.7995992 ]
mean value: 0.799638796147963
key: test_fscore
value: [0.7962963 0.76190476 0.72897196 0.77358491 0.875 0.76785714
0.83185841 0.7962963 0.82142857 0.76785714]
mean value: 0.7921055486997057
key: train_fscore
value: [0.7959596 0.80283114 0.8 0.79757085 0.79102956 0.799591
0.79145473 0.79757085 0.79795918 0.79633401]
mean value: 0.7970300928959978
key: test_precision
value: [0.81132075 0.8 0.75 0.80392157 0.875 0.76785714
0.8245614 0.82692308 0.80701754 0.75438596]
mean value: 0.8020987455405354
key: train_precision
value: [0.80244399 0.81020408 0.81069959 0.80572597 0.80331263 0.81458333
0.80206186 0.80408163 0.81288981 0.80952381]
mean value: 0.8075526706803229
key: test_recall
value: [0.78181818 0.72727273 0.70909091 0.74545455 0.875 0.76785714
0.83928571 0.76785714 0.83636364 0.78181818]
mean value: 0.7831818181818182
key: train_recall
value: [0.78957916 0.79559118 0.78957916 0.78957916 0.77911647 0.78514056
0.7811245 0.79116466 0.78356713 0.78356713]
mean value: 0.7868009110590659
key: test_roc_auc
value: [0.80162338 0.77435065 0.73847403 0.78344156 0.87386364 0.76574675
0.82873377 0.80211039 0.81818182 0.76363636]
mean value: 0.7950162337662338
key: train_roc_auc
value: [0.79740002 0.8044221 0.8024201 0.79940805 0.79436785 0.80339192
0.79436986 0.79938994 0.80160321 0.7995992 ]
mean value: 0.7996372262597484
key: test_jcc
value: [0.66153846 0.61538462 0.57352941 0.63076923 0.77777778 0.62318841
0.71212121 0.66153846 0.6969697 0.62318841]
mean value: 0.6576005679458365
key: train_jcc
value: [0.66107383 0.67060811 0.66666667 0.66329966 0.65430017 0.66609881
0.65488215 0.66329966 0.66383701 0.66159052]
mean value: 0.6625656594308654
MCC on Blind test: 0.48
Accuracy on Blind test: 0.78
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03798008 0.03545737 0.03012061 0.03877425 0.0317688 0.03150177
0.03185892 0.03116226 0.02460456 0.03471065]
mean value: 0.03279392719268799
key: score_time
value: [0.01224113 0.01254487 0.01250696 0.01257157 0.01247406 0.01254869
0.01248217 0.01216841 0.01248527 0.01256871]
mean value: 0.012459182739257812
key: test_mcc
value: [0.66004053 0.72396756 0.19509708 0.73247207 0.82027988 0.4414112
0.82027988 0.38670108 0.58911518 0.59648483]
mean value: 0.5965849292121154
key: train_mcc
value: [0.78537137 0.78847029 0.1796513 0.78565201 0.79562144 0.52187278
0.78153894 0.52435675 0.67276339 0.72676811]
mean value: 0.6562066371914825
key: test_accuracy
value: [0.82882883 0.85585586 0.54054054 0.86486486 0.90990991 0.67567568
0.90990991 0.64864865 0.78181818 0.78181818]
mean value: 0.7797870597870598
key: train_accuracy
value: [0.89267803 0.89067202 0.53259779 0.89167503 0.89769308 0.72316951
0.89067202 0.7221665 0.82364729 0.85571142]
mean value: 0.8120682689350617
key: test_fscore
value: [0.81904762 0.86666667 0.13559322 0.85714286 0.9122807 0.75342466
0.9122807 0.49350649 0.74468085 0.8125 ]
mean value: 0.7307123768809468
key: train_fscore
value: [0.89246231 0.89765258 0.12734082 0.8875 0.89880952 0.77990431
0.8917577 0.62106703 0.79582367 0.86909091]
mean value: 0.766140885029211
key: test_precision
value: [0.86 0.8 1. 0.9 0.89655172 0.61111111
0.89655172 0.9047619 0.8974359 0.71232877]
mean value: 0.8478741128708063
key: train_precision
value: [0.89516129 0.84452297 0.97142857 0.92407809 0.88823529 0.6468254
0.88212181 0.97424893 0.94490358 0.7953411 ]
mean value: 0.8766867025939546
key: test_recall
value: [0.78181818 0.94545455 0.07272727 0.81818182 0.92857143 0.98214286
0.92857143 0.33928571 0.63636364 0.94545455]
mean value: 0.7378571428571429
key: train_recall
value: [0.88977956 0.95791583 0.06813627 0.85370741 0.90963855 0.98192771
0.90160643 0.45582329 0.68737475 0.95791583]
mean value: 0.766382564325438
key: test_roc_auc
value: [0.82840909 0.85665584 0.53636364 0.86444805 0.90974026 0.67288961
0.90974026 0.65146104 0.78181818 0.78181818]
mean value: 0.7793344155844155
key: train_roc_auc
value: [0.89268094 0.8906045 0.53306412 0.89171315 0.89770505 0.72342879
0.89068297 0.72189962 0.82364729 0.85571142]
mean value: 0.8121137858045409
key: test_jcc
value: [0.69354839 0.76470588 0.07272727 0.75 0.83870968 0.6043956
0.83870968 0.32758621 0.59322034 0.68421053]
mean value: 0.6167813573606694
key: train_jcc
value: [0.80580762 0.81431005 0.068 0.79775281 0.81621622 0.63921569
0.8046595 0.45039683 0.66088632 0.76848875]
mean value: 0.6625733774522629
MCC on Blind test: 0.47
Accuracy on Blind test: 0.67
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.04141402 0.03731918 0.03793001 0.03758025 0.03285313 0.03277278
0.03082371 0.05231643 0.0315969 0.02983832]
mean value: 0.036444473266601565
key: score_time
value: [0.01278472 0.01252007 0.01252818 0.01277971 0.01252961 0.01251364
0.0123961 0.01258612 0.01246643 0.01259232]
mean value: 0.012569689750671386
key: test_mcc
value: [0.72978244 0.69891539 0.74772727 0.61746362 0.75009679 0.78434561
0.7964953 0.77224584 0.76477489 0.67451348]
mean value: 0.7336360627410452
key: train_mcc
value: [0.80907859 0.75937672 0.83892384 0.67525521 0.74664046 0.82363952
0.74461161 0.79763036 0.80597545 0.75551102]
mean value: 0.7756642761750585
key: test_accuracy
value: [0.86486486 0.84684685 0.87387387 0.8018018 0.86486486 0.89189189
0.89189189 0.87387387 0.88181818 0.83636364]
mean value: 0.8628091728091728
key: train_accuracy
value: [0.90371113 0.87462387 0.91875627 0.82848546 0.86459378 0.91173521
0.86258776 0.89267803 0.90280561 0.87274549]
mean value: 0.883272261674804
key: test_fscore
value: [0.86238532 0.83495146 0.87272727 0.7755102 0.88 0.89090909
0.90163934 0.88888889 0.88495575 0.83018868]
mean value: 0.862215600973845
key: train_fscore
value: [0.90679612 0.86368593 0.9211295 0.80634202 0.87760653 0.91252485
0.87646528 0.90120037 0.90424482 0.86150491]
mean value: 0.8831500324767252
key: test_precision
value: [0.87037037 0.89583333 0.87272727 0.88372093 0.79710145 0.90740741
0.83333333 0.8 0.86206897 0.8627451 ]
mean value: 0.8585308160236095
key: train_precision
value: [0.87947269 0.94736842 0.89583333 0.92708333 0.8 0.90354331
0.79541735 0.83418803 0.89105058 0.94497608]
mean value: 0.8818933130847411
key: test_recall
value: [0.85454545 0.78181818 0.87272727 0.69090909 0.98214286 0.875
0.98214286 1. 0.90909091 0.8 ]
mean value: 0.8748376623376624
key: train_recall
value: [0.93587174 0.79358717 0.94789579 0.71342685 0.97188755 0.92168675
0.97590361 0.97991968 0.91783567 0.79158317]
mean value: 0.894959799116305
key: test_roc_auc
value: [0.86477273 0.84626623 0.87386364 0.80081169 0.8637987 0.89204545
0.89107143 0.87272727 0.88181818 0.83636364]
mean value: 0.8623538961038961
key: train_roc_auc
value: [0.90367884 0.87470523 0.91872701 0.82860098 0.86470129 0.91174518
0.86270131 0.89276545 0.90280561 0.87274549]
mean value: 0.8833176392946536
key: test_jcc
value: [0.75806452 0.71666667 0.77419355 0.63333333 0.78571429 0.80327869
0.82089552 0.8 0.79365079 0.70967742]
mean value: 0.7595474774148697
key: train_jcc
value: [0.8294849 0.76007678 0.85379061 0.67552182 0.7819063 0.83912249
0.78009631 0.82016807 0.82522523 0.75670498]
mean value: 0.7922097481345935
MCC on Blind test: 0.63
Accuracy on Blind test: 0.83
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.3187921 0.29844427 0.30731273 0.29335833 0.29589033 0.29770684
0.29465437 0.29198313 0.29960108 0.28603816]
mean value: 0.29837813377380373
key: score_time
value: [0.0161016 0.01719046 0.01659346 0.01645303 0.01635051 0.01641083
0.01618958 0.01644993 0.01580024 0.0157938 ]
mean value: 0.016333341598510742
key: test_mcc
value: [0.89249761 0.85584416 0.87398511 0.82480596 0.86471225 0.84111937
0.85798501 0.93029809 0.85967619 0.87635609]
mean value: 0.8677279843941925
key: train_mcc
value: [0.93607708 0.93628772 0.93240093 0.94638611 0.94833373 0.94431975
0.94237551 0.92853373 0.9465782 0.94425948]
mean value: 0.9405552228633982
key: test_accuracy
value: [0.94594595 0.92792793 0.93693694 0.90990991 0.92792793 0.91891892
0.92792793 0.96396396 0.92727273 0.93636364]
mean value: 0.9323095823095823
key: train_accuracy
value: [0.96790371 0.96790371 0.96589769 0.97291876 0.97392177 0.97191575
0.97091274 0.96389168 0.97294589 0.97194389]
mean value: 0.9700155576951295
key: test_fscore
value: [0.94642857 0.92727273 0.93577982 0.9137931 0.93333333 0.92307692
0.93103448 0.96551724 0.93103448 0.93913043]
mean value: 0.9346401116752753
key: train_fscore
value: [0.96831683 0.96844181 0.96653543 0.97339901 0.9743083 0.97233202
0.97137216 0.96456693 0.97345133 0.97233202]
mean value: 0.9705055844606678
key: test_precision
value: [0.92982456 0.92727273 0.94444444 0.86885246 0.875 0.8852459
0.9 0.93333333 0.8852459 0.9 ]
mean value: 0.9049219328749096
key: train_precision
value: [0.95694716 0.95339806 0.94970986 0.95736434 0.95914397 0.95719844
0.95533981 0.94594595 0.95559846 0.95906433]
mean value: 0.9549710373674181
key: test_recall
value: [0.96363636 0.92727273 0.92727273 0.96363636 1. 0.96428571
0.96428571 1. 0.98181818 0.98181818]
mean value: 0.9674025974025974
key: train_recall
value: [0.97995992 0.98396794 0.98396794 0.98997996 0.98995984 0.98795181
0.98795181 0.98393574 0.99198397 0.98597194]
mean value: 0.9865630860113802
key: test_roc_auc
value: [0.9461039 0.92792208 0.93685065 0.91038961 0.92727273 0.91850649
0.9275974 0.96363636 0.92727273 0.93636364]
mean value: 0.9321915584415584
key: train_roc_auc
value: [0.96789161 0.96788758 0.96587955 0.97290163 0.97393784 0.97193182
0.97092981 0.96391176 0.97294589 0.97194389]
mean value: 0.9700161366910527
key: test_jcc
value: [0.89830508 0.86440678 0.87931034 0.84126984 0.875 0.85714286
0.87096774 0.93333333 0.87096774 0.8852459 ]
mean value: 0.877594962649071
key: train_jcc
value: [0.93857965 0.93881453 0.9352381 0.94817658 0.94990366 0.94615385
0.94433781 0.93155894 0.94827586 0.94615385]
mean value: 0.9427192827315077
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.22250032 0.21884203 0.22811842 0.22957635 0.23531556 0.22903347
0.20518231 0.22516227 0.24410057 0.22989869]
mean value: 0.22677299976348878
key: score_time
value: [0.04108667 0.02354312 0.03398275 0.04121399 0.03230572 0.03941464
0.02867675 0.03275776 0.03820419 0.04190421]
mean value: 0.03530898094177246
key: test_mcc
value: [0.92854828 0.92854828 0.87520354 0.89249761 0.86471225 0.89704631
0.856354 0.94730174 0.86373129 0.87988269]
mean value: 0.8933825983815884
key: train_mcc
value: [0.98998777 0.99198387 0.99599596 0.98398339 0.99799599 0.98997191
0.99399998 0.99399998 0.99002966 0.99400594]
mean value: 0.9921954443123457
key: test_accuracy
value: [0.96396396 0.96396396 0.93693694 0.94594595 0.92792793 0.94594595
0.92792793 0.97297297 0.92727273 0.93636364]
mean value: 0.9449221949221949
key: train_accuracy
value: [0.99498495 0.99598796 0.99799398 0.99197593 0.99899699 0.99498495
0.99699097 0.99699097 0.99498998 0.99699399]
mean value: 0.9960890688096353
key: test_fscore
value: [0.96428571 0.96428571 0.9380531 0.94642857 0.93333333 0.94915254
0.92982456 0.97391304 0.93220339 0.94017094]
mean value: 0.9471650907934566
key: train_fscore
value: [0.995005 0.996 0.998 0.99201597 0.99899699 0.99498495
0.996997 0.996997 0.99501496 0.997003 ]
mean value: 0.9961014855037967
key: test_precision
value: [0.94736842 0.94736842 0.9137931 0.92982456 0.875 0.90322581
0.9137931 0.94915254 0.87301587 0.88709677]
mean value: 0.913963860643924
key: train_precision
value: [0.99203187 0.99401198 0.99600798 0.98807157 0.99799599 0.99398798
0.99401198 0.99401198 0.99007937 0.9940239 ]
mean value: 0.9934234592659856
key: test_recall
value: [0.98181818 0.98181818 0.96363636 0.96363636 1. 1.
0.94642857 1. 1. 1. ]
mean value: 0.9837337662337662
key: train_recall
value: [0.99799599 0.99799599 1. 0.99599198 1. 0.99598394
1. 1. 1. 1. ]
mean value: 0.9987967903678844
key: test_roc_auc
value: [0.96412338 0.96412338 0.93717532 0.9461039 0.92727273 0.94545455
0.92775974 0.97272727 0.92727273 0.93636364]
mean value: 0.9448376623376623
key: train_roc_auc
value: [0.99498193 0.99598595 0.99799197 0.9919719 0.998998 0.99498596
0.99699399 0.99699399 0.99498998 0.99699399]
mean value: 0.9960887638731277
key: test_jcc
value: [0.93103448 0.93103448 0.88333333 0.89830508 0.875 0.90322581
0.86885246 0.94915254 0.87301587 0.88709677]
mean value: 0.9000050838646646
key: train_jcc
value: [0.99005964 0.99203187 0.99600798 0.98415842 0.99799599 0.99001996
0.99401198 0.99401198 0.99007937 0.9940239 ]
mean value: 0.992240108815205
MCC on Blind test: 0.73
Accuracy on Blind test: 0.89
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.45735979 0.48779058 0.38904047 0.40051913 0.49239159 0.37844515
0.39243364 0.41860628 0.4010551 0.42596316]
mean value: 0.42436048984527586
key: score_time
value: [0.02394366 0.04344606 0.02420187 0.03411937 0.04596186 0.03702545
0.03489757 0.02422857 0.02380872 0.03269696]
mean value: 0.03243300914764404
key: test_mcc
value: [0.86504296 0.85816689 0.805216 0.82480596 0.81228039 0.81228039
0.75979502 0.73528651 0.78590525 0.75346772]
mean value: 0.8012247103768273
key: train_mcc
value: [0.97605307 0.97408004 0.97219678 0.96221188 0.96613365 0.97211189
0.97006835 0.96417189 0.97410635 0.97800501]
mean value: 0.9709138904055334
key: test_accuracy
value: [0.92792793 0.92792793 0.9009009 0.90990991 0.9009009 0.9009009
0.87387387 0.86486486 0.88181818 0.87272727]
mean value: 0.8961752661752662
key: train_accuracy
value: [0.98796389 0.98696088 0.98595787 0.98094283 0.98294885 0.98595787
0.98495486 0.98194584 0.98697395 0.98897796]
mean value: 0.9853584802503703
key: test_fscore
value: [0.93220339 0.92982456 0.90434783 0.9137931 0.90909091 0.90909091
0.8852459 0.87394958 0.89430894 0.88135593]
mean value: 0.9033211055715166
key: train_fscore
value: [0.98807157 0.98709037 0.98613861 0.98120673 0.98311817 0.9860835
0.98507463 0.98214286 0.98709037 0.9890329 ]
mean value: 0.9855049702408853
key: test_precision
value: [0.87301587 0.89830508 0.86666667 0.86885246 0.84615385 0.84615385
0.81818182 0.82539683 0.80882353 0.82539683]
mean value: 0.8476946774139622
key: train_precision
value: [0.98027613 0.97834646 0.97455969 0.96875 0.97249509 0.97637795
0.97633136 0.97058824 0.97834646 0.98412698]
mean value: 0.9760198355928966
key: test_recall
value: [1. 0.96363636 0.94545455 0.96363636 0.98214286 0.98214286
0.96428571 0.92857143 1. 0.94545455]
mean value: 0.9675324675324675
key: train_recall
value: [0.99599198 0.99599198 0.99799599 0.99398798 0.9939759 0.99598394
0.9939759 0.9939759 0.99599198 0.99398798]
mean value: 0.9951859542377929
key: test_roc_auc
value: [0.92857143 0.92824675 0.9012987 0.91038961 0.90016234 0.90016234
0.87305195 0.86428571 0.88181818 0.87272727]
mean value: 0.8960714285714286
key: train_roc_auc
value: [0.98795583 0.98695182 0.98594579 0.98092973 0.9829599 0.98596792
0.9849639 0.98195789 0.98697395 0.98897796]
mean value: 0.9853584679398959
key: test_jcc
value: [0.87301587 0.86885246 0.82539683 0.84126984 0.83333333 0.83333333
0.79411765 0.7761194 0.80882353 0.78787879]
mean value: 0.824214103270005
key: train_jcc
value: [0.97642436 0.9745098 0.97265625 0.9631068 0.96679688 0.97254902
0.97058824 0.96491228 0.9745098 0.97830375]
mean value: 0.9714357173590997
MCC on Blind test: 0.49
Accuracy on Blind test: 0.8
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.30451274 1.32792377 1.30644488 1.30304623 1.33850408 1.34490895
1.34416556 1.32531881 1.29144812 1.29360223]
mean value: 1.3179875373840333
key: score_time
value: [0.00983834 0.01067233 0.01020646 0.01000905 0.01045823 0.01032877
0.01025462 0.00977778 0.00993848 0.00966263]
mean value: 0.010114669799804688
key: test_mcc
value: [0.94735177 0.93038564 0.91127765 0.8972375 0.88077101 0.87733514
0.89414155 0.93029809 0.86373129 0.91287093]
mean value: 0.9045400572930724
key: train_mcc
value: [0.98803531 0.9900196 0.9900196 0.9900196 0.98605528 0.98803559
0.98803559 0.9900198 0.98409441 0.99201584]
mean value: 0.9886350623024839
key: test_accuracy
value: [0.97297297 0.96396396 0.95495495 0.94594595 0.93693694 0.93693694
0.94594595 0.96396396 0.92727273 0.95454545]
mean value: 0.9503439803439804
key: train_accuracy
value: [0.99398195 0.99498495 0.99498495 0.99498495 0.99297894 0.99398195
0.99398195 0.99498495 0.99198397 0.99599198]
mean value: 0.9942840545685152
key: test_fscore
value: [0.97345133 0.96491228 0.95575221 0.94827586 0.94117647 0.94017094
0.94827586 0.96551724 0.93220339 0.95652174]
mean value: 0.9526257325762123
key: train_fscore
value: [0.9940239 0.99501496 0.99501496 0.99501496 0.99302094 0.99401198
0.99401198 0.995005 0.99204771 0.99600798]
mean value: 0.9943174351825127
key: test_precision
value: [0.94827586 0.93220339 0.93103448 0.90163934 0.88888889 0.90163934
0.91666667 0.93333333 0.87301587 0.91666667]
mean value: 0.9143363851754114
key: train_precision
value: [0.98811881 0.99007937 0.99007937 0.99007937 0.98613861 0.98809524
0.98809524 0.99005964 0.98422091 0.99204771]
mean value: 0.9887014260333787
key: test_recall
value: [1. 1. 0.98181818 1. 1. 0.98214286
0.98214286 1. 1. 1. ]
mean value: 0.9946103896103896
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97321429 0.96428571 0.95519481 0.94642857 0.93636364 0.93652597
0.94561688 0.96363636 0.92727273 0.95454545]
mean value: 0.9503084415584415
key: train_roc_auc
value: [0.9939759 0.99497992 0.99497992 0.99497992 0.99298597 0.99398798
0.99398798 0.99498998 0.99198397 0.99599198]
mean value: 0.9942843518362025
key: test_jcc
value: [0.94827586 0.93220339 0.91525424 0.90163934 0.88888889 0.88709677
0.90163934 0.93333333 0.87301587 0.91666667]
mean value: 0.909801371381051
key: train_jcc
value: [0.98811881 0.99007937 0.99007937 0.99007937 0.98613861 0.98809524
0.98809524 0.99005964 0.98422091 0.99204771]
mean value: 0.9887014260333787
MCC on Blind test: 0.77
Accuracy on Blind test: 0.91
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.04034972 0.04601431 0.0428679 0.04311156 0.05195618 0.05159187
0.04370213 0.04357862 0.04359293 0.05074596]
mean value: 0.045751118659973146
key: score_time
value: [0.01296997 0.01311469 0.01335645 0.01352835 0.01321268 0.01309395
0.01326084 0.01323462 0.01333642 0.01326323]
mean value: 0.0132371187210083
key: test_mcc
value: [0.37650043 0.4359956 0.40253236 0.53411996 0.38789039 0.41128197
0.3513061 0.3513061 0.34992711 0.33333333]
mean value: 0.3934193366468438
key: train_mcc
value: [0.37027674 0.37377607 0.38245279 0.55233529 0.50102346 0.45326411
0.3885448 0.39195242 0.37666488 0.38184999]
mean value: 0.41721405362271696
key: test_accuracy
value: [0.62162162 0.65765766 0.64864865 0.72972973 0.66666667 0.66666667
0.61261261 0.61261261 0.60909091 0.6 ]
mean value: 0.6425307125307125
key: train_accuracy
value: [0.62086259 0.62286861 0.62788365 0.7442327 0.70611836 0.67402207
0.63089268 0.6328987 0.6242485 0.62725451]
mean value: 0.6511282344026066
key: test_fscore
value: [0.72368421 0.74324324 0.73469388 0.7826087 0.73758865 0.74482759
0.72258065 0.72258065 0.71895425 0.71428571]
mean value: 0.7345047518636227
key: train_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[0.7252907 0.72634643 0.72899927 0.79285134 0.77091478 0.75285171
0.73020528 0.73127753 0.72687546 0.72846715]
mean value: 0.7414079649678472
key: test_precision
value: [0.56701031 0.59139785 0.58695652 0.65060241 0.61176471 0.60674157
0.56565657 0.56565657 0.56122449 0.55555556]
mean value: 0.5862566545699067
key: train_precision
value: [0.56898518 0.57028571 0.57356322 0.66666667 0.631242 0.60587515
0.57505774 0.57638889 0.57093822 0.57290471]
mean value: 0.5911907474465508
key: test_recall
value: [1. 1. 0.98181818 0.98181818 0.92857143 0.96428571
1. 1. 1. 1. ]
mean value: 0.9856493506493507
key: train_recall
value: [1. 1. 1. 0.97795591 0.98995984 0.9939759
1. 1. 1. 1. ]
mean value: 0.9961891654795535
key: test_roc_auc
value: [0.625 0.66071429 0.65162338 0.73198052 0.66428571 0.66396104
0.60909091 0.60909091 0.60909091 0.6 ]
mean value: 0.6424837662337662
key: train_roc_auc
value: [0.62048193 0.62248996 0.62751004 0.74399804 0.70640277 0.67434266
0.63126253 0.63326653 0.6242485 0.62725451]
mean value: 0.6511257454668373
key: test_jcc
value: [0.56701031 0.59139785 0.58064516 0.64285714 0.58426966 0.59340659
0.56565657 0.56565657 0.56122449 0.55555556]
mean value: 0.5807679895880729
key: train_jcc
value: [0.56898518 0.57028571 0.57356322 0.65679677 0.62722646 0.60365854
0.57505774 0.57638889 0.57093822 0.57290471]
mean value: 0.5895805426902528
MCC on Blind test: 0.21
Accuracy on Blind test: 0.51
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02735806 0.04222441 0.04203415 0.04227781 0.0215807 0.03210664
0.01891565 0.03945494 0.05434728 0.04150367]
mean value: 0.03618032932281494
key: score_time
value: [0.01939583 0.02228022 0.01953959 0.01950502 0.01263547 0.01240468
0.01373148 0.01946378 0.0194211 0.02454901]
mean value: 0.018292617797851563
key: test_mcc
value: [0.78376623 0.75237443 0.67619361 0.80305531 0.8049036 0.74951538
0.83897362 0.74951538 0.76477489 0.64715023]
mean value: 0.757022267046191
key: train_mcc
value: [0.79982796 0.79935063 0.81028734 0.80841289 0.795357 0.79725623
0.79748273 0.7865219 0.79388846 0.80124844]
mean value: 0.7989633567898495
key: test_accuracy
value: [0.89189189 0.87387387 0.83783784 0.9009009 0.9009009 0.87387387
0.91891892 0.87387387 0.88181818 0.81818182]
mean value: 0.8772072072072072
key: train_accuracy
value: [0.89869609 0.89869609 0.90371113 0.90270812 0.89669007 0.89769308
0.89769308 0.89267803 0.89579158 0.8997996 ]
mean value: 0.8984156879456003
key: test_fscore
value: [0.89090909 0.87931034 0.83928571 0.90265487 0.90598291 0.87931034
0.92173913 0.87931034 0.88495575 0.83333333]
mean value: 0.8816791828897612
key: train_fscore
value: [0.90260366 0.90222652 0.90769231 0.90682037 0.90009699 0.90097087
0.90116279 0.89540567 0.8996139 0.90291262]
mean value: 0.9019505710094796
key: test_precision
value: [0.89090909 0.83606557 0.8245614 0.87931034 0.86885246 0.85
0.89830508 0.85 0.86206897 0.76923077]
mean value: 0.8529303691526108
key: train_precision
value: [0.86988848 0.87265918 0.87245841 0.87084871 0.87054409 0.87218045
0.87078652 0.87238095 0.86778399 0.87570621]
mean value: 0.8715236980915356
key: test_recall
value: [0.89090909 0.92727273 0.85454545 0.92727273 0.94642857 0.91071429
0.94642857 0.91071429 0.90909091 0.90909091]
mean value: 0.9132467532467532
key: train_recall
value: [0.93787575 0.93386774 0.94589178 0.94589178 0.93172691 0.93172691
0.93373494 0.91967871 0.93386774 0.93186373]
mean value: 0.9346125986913586
key: test_roc_auc
value: [0.89188312 0.87435065 0.83798701 0.90113636 0.90048701 0.87353896
0.91866883 0.87353896 0.88181818 0.81818182]
mean value: 0.8771590909090909
key: train_roc_auc
value: [0.89865675 0.89866078 0.90366878 0.90266477 0.89672518 0.89772718
0.89772919 0.89270509 0.89579158 0.8997996 ]
mean value: 0.8984128900371023
key: test_jcc
value: [0.80327869 0.78461538 0.72307692 0.82258065 0.828125 0.78461538
0.85483871 0.78461538 0.79365079 0.71428571]
mean value: 0.7893682628222884
key: train_jcc
value: [0.82249561 0.82186949 0.83098592 0.82952548 0.81834215 0.81978799
0.82010582 0.81061947 0.81754386 0.82300885]
mean value: 0.8214284629540267
MCC on Blind test: 0.59
Accuracy on Blind test: 0.82
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:196: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:199: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.35478425 0.34318781 0.25689292 0.37095976 0.29249549 0.43200922
0.29593349 0.31350493 0.3035562 0.22859263]
mean value: 0.3191916704177856
key: score_time
value: [0.01961803 0.01251912 0.01954269 0.01953173 0.01246142 0.01285362
0.01274061 0.02135682 0.024894 0.0129559 ]
mean value: 0.016847395896911622
key: test_mcc
value: [0.78376623 0.75237443 0.67762003 0.7137294 0.82182846 0.74951538
0.83897362 0.74951538 0.76477489 0.64715023]
mean value: 0.7499248050806804
key: train_mcc
value: [0.79982796 0.79935063 0.82538044 0.81028734 0.7968424 0.79725623
0.79748273 0.7865219 0.79388846 0.80124844]
mean value: 0.800808652175234
key: test_accuracy
value: [0.89189189 0.87387387 0.83783784 0.85585586 0.90990991 0.87387387
0.91891892 0.87387387 0.88181818 0.81818182]
mean value: 0.8736036036036036
key: train_accuracy
value: [0.89869609 0.89869609 0.91173521 0.90371113 0.89769308 0.89769308
0.89769308 0.89267803 0.89579158 0.8997996 ]
mean value: 0.8994186969726816
key: test_fscore
value: [0.89090909 0.87931034 0.84210526 0.85964912 0.9137931 0.87931034
0.92173913 0.87931034 0.88495575 0.83333333]
mean value: 0.8784415830785542
key: train_fscore
value: [0.90260366 0.90222652 0.91472868 0.90769231 0.9005848 0.90097087
0.90116279 0.89540567 0.8996139 0.90291262]
mean value: 0.9027901829342879
key: test_precision
value: [0.89090909 0.83606557 0.81355932 0.83050847 0.88333333 0.85
0.89830508 0.85 0.86206897 0.76923077]
mean value: 0.8483980614116859
key: train_precision
value: [0.86988848 0.87265918 0.88555347 0.87245841 0.875 0.87218045
0.87078652 0.87238095 0.86778399 0.87570621]
mean value: 0.873439765329131
key: test_recall
value: [0.89090909 0.92727273 0.87272727 0.89090909 0.94642857 0.91071429
0.94642857 0.91071429 0.90909091 0.90909091]
mean value: 0.9114285714285714
key: train_recall
value: [0.93787575 0.93386774 0.94589178 0.94589178 0.92771084 0.93172691
0.93373494 0.91967871 0.93386774 0.93186373]
mean value: 0.9342109922656558
key: test_roc_auc
value: [0.89188312 0.87435065 0.83814935 0.85616883 0.90957792 0.87353896
0.91866883 0.87353896 0.88181818 0.81818182]
mean value: 0.8735876623376624
key: train_roc_auc
value: [0.89865675 0.89866078 0.91170091 0.90366878 0.89772316 0.89772718
0.89772919 0.89270509 0.89579158 0.8997996 ]
mean value: 0.899416302484487
key: test_jcc
value: [0.80327869 0.78461538 0.72727273 0.75384615 0.84126984 0.78461538
0.85483871 0.78461538 0.79365079 0.71428571]
mean value: 0.7842288782373393
key: train_jcc
value: [0.82249561 0.82186949 0.84285714 0.83098592 0.81914894 0.81978799
0.82010582 0.81061947 0.81754386 0.82300885]
mean value: 0.8228423073588096
MCC on Blind test: 0.59
Accuracy on Blind test: 0.82